Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 63
Filtrar
1.
Front Plant Sci ; 15: 1324090, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38504889

RESUMO

In the field of plant breeding, various machine learning models have been developed and studied to evaluate the genomic prediction (GP) accuracy of unseen phenotypes. Deep learning has shown promise. However, most studies on deep learning in plant breeding have been limited to small datasets, and only a few have explored its application in moderate-sized datasets. In this study, we aimed to address this limitation by utilizing a moderately large dataset. We examined the performance of a deep learning (DL) model and compared it with the widely used and powerful best linear unbiased prediction (GBLUP) model. The goal was to assess the GP accuracy in the context of a five-fold cross-validation strategy and when predicting complete environments using the DL model. The results revealed the DL model outperformed the GBLUP model in terms of GP accuracy for two out of the five included traits in the five-fold cross-validation strategy, with similar results in the other traits. This indicates the superiority of the DL model in predicting these specific traits. Furthermore, when predicting complete environments using the leave-one-environment-out (LOEO) approach, the DL model demonstrated competitive performance. It is worth noting that the DL model employed in this study extends a previously proposed multi-modal DL model, which had been primarily applied to image data but with small datasets. By utilizing a moderately large dataset, we were able to evaluate the performance and potential of the DL model in a context with more information and challenging scenario in plant breeding.

2.
Genes (Basel) ; 15(3)2024 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-38540344

RESUMO

Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data augmentation (DA) generate synthetic data from the original training set to increase the training set and to improve the prediction performance of any statistical or machine learning algorithm. There is much empirical evidence of their success in many computer vision applications. Due to this, DA was explored in the context of GS using 14 real datasets. We found empirical evidence that DA is a powerful tool to improve the prediction accuracy, since we improved the prediction accuracy of the top lines in the 14 datasets under study. On average, across datasets and traits, the gain in prediction performance of the DA approach regarding the Conventional method in the top 20% of lines in the testing set was 108.4% in terms of the NRMSE and 107.4% in terms of the MAAPE, but a worse performance was observed on the whole testing set. We encourage more empirical evaluations to support our findings.


Assuntos
Genoma de Planta , Genômica , Fenótipo , Aprendizado de Máquina , Redes Neurais de Computação
3.
Theor Appl Genet ; 137(1): 21, 2024 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-38221602

RESUMO

KEY MESSAGE: Genomic prediction models for quantitative traits assume continuous and normally distributed phenotypes. In this research, we proposed a novel Bayesian discrete lognormal regression model. Genomic selection is a powerful tool in modern breeding programs that uses genomic information to predict the performance of individuals and select those with desirable traits. It has revolutionized animal and plant breeding, as it allows breeders to identify the best candidates without labor-intensive and time-consuming phenotypic evaluations. While several statistical models have been developed, most of them have been for quantitative continuous traits and only a few for count responses. In this paper, we propose a discrete lognormal regression model in the Bayesian context, that with a Gibbs sampler to explore the corresponding posterior distribution and make the predictions. Two datasets of resistance disease is used in the wheat crop and are then evaluated against the traditional Gaussian model and a lognormal model. The results indicate the proposed model is a competitive and natural model for predicting count genomic traits.


Assuntos
Modelos Genéticos , Melhoramento Vegetal , Humanos , Animais , Teorema de Bayes , Genoma , Genômica/métodos , Fenótipo
4.
G3 (Bethesda) ; 14(2)2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38079160

RESUMO

Genomic selection is revolutionizing plant breeding. However, its practical implementation is still very challenging, since predicted values do not necessarily have high correspondence to the observed phenotypic values. When the goal is to predict within-family, it is not always possible to obtain reasonable accuracies, which is of paramount importance to improve the selection process. For this reason, in this research, we propose the Adversaria-Boruta (AB) method, which combines the virtues of the adversarial validation (AV) method and the Boruta feature selection method. The AB method operates primarily by minimizing the disparity between training and testing distributions. This is accomplished by reducing the weight assigned to markers that display the most significant differences between the training and testing sets. Therefore, the AB method built a weighted genomic relationship matrix that is implemented with the genomic best linear unbiased predictor (GBLUP) model. The proposed AB method is compared using 12 real data sets with the GBLUP model that uses a nonweighted genomic relationship matrix. Our results show that the proposed AB method outperforms the GBLUP by 8.6, 19.7, and 9.8% in terms of Pearson's correlation, mean square error, and normalized root mean square error, respectively. Our results support that the proposed AB method is a useful tool to improve the prediction accuracy of a complete family, however, we encourage other investigators to evaluate the AB method to increase the empirical evidence of its potential.


Assuntos
Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Genoma , Genômica/métodos , Modelos Lineares , Fenótipo , Genótipo
5.
Int J Mol Sci ; 24(18)2023 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-37762107

RESUMO

Genomic selection (GS) plays a pivotal role in hybrid prediction. It can enhance the selection of parental lines, accurately predict hybrid performance, and harness hybrid vigor. Likewise, it can optimize breeding strategies by reducing field trial requirements, expediting hybrid development, facilitating targeted trait improvement, and enhancing adaptability to diverse environments. Leveraging genomic information empowers breeders to make informed decisions and significantly improve the efficiency and success rate of hybrid breeding programs. In order to improve the genomic ability performance, we explored the incorporation of parental phenotypic information as covariates under a multi-trait framework. Approach 1, referred to as Pmean, directly utilized parental phenotypic information without any preprocessing. While approach 2, denoted as BV, replaced the direct use of phenotypic values of both parents with their respective breeding values. While an improvement in prediction performance was observed in both approaches, with a minimum 4.24% reduction in the normalized root mean square error (NRMSE), the direct incorporation of parental phenotypic information in the Pmean approach slightly outperformed the BV approach. We also compared these two approaches using linear and nonlinear kernels, but no relevant gain was observed. Finally, our results increase empirical evidence confirming that the integration of parental phenotypic information helps increase the prediction performance of hybrids.


Assuntos
Hibridização Genética , Modelos Genéticos , Genoma de Planta , Fenótipo , Genômica/métodos , Melhoramento Vegetal
6.
Front Plant Sci ; 14: 1218151, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37564390

RESUMO

Introduction: Genomic selection (GS) has gained global importance due to its potential to accelerate genetic progress and improve the efficiency of breeding programs. Objectives of the research: In this research we proposed a method to improve the prediction accuracy of tested lines in new (untested) environments. Method-1: The new method trained the model with a modified response variable (a difference of response variables) that decreases the lack of a non-stationary distribution between the training and testing and improved the prediction accuracy. Comparing new and conventional method: We compared the prediction accuracy of the conventional genomic best linear unbiased prediction (GBLUP) model (M1) including (or not) genotype × environment interaction (GE) (M1_GE; M1_NO_GE) versus the proposed method (M2) on several data sets. Results and discussion: The gain in prediction accuracy of M2, versus M1_GE, M1_NO_GE in terms of Pearson´s correlation was of at least 4.3%, while in terms of percentage of top-yielding lines captured when was selected the 10% (Best10) and 20% (Best20) of lines was at least of 19.5%, while in terms of Normalized Root Mean Squared Error (NRMSE) was of at least of 42.29%.

7.
Front Genet ; 14: 1209275, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37554404

RESUMO

Genomic selection (GS) is transforming plant and animal breeding, but its practical implementation for complex traits and multi-environmental trials remains challenging. To address this issue, this study investigates the integration of environmental information with genotypic information in GS. The study proposes the use of two feature selection methods (Pearson's correlation and Boruta) for the integration of environmental information. Results indicate that the simple incorporation of environmental covariates may increase or decrease prediction accuracy depending on the case. However, optimal incorporation of environmental covariates using feature selection significantly improves prediction accuracy in four out of six datasets between 14.25% and 218.71% under a leave one environment out cross validation scenario in terms of Normalized Root Mean Squared Error, but not relevant gain was observed in terms of Pearson´s correlation. In two datasets where environmental covariates are unrelated to the response variable, feature selection is unable to enhance prediction accuracy. Therefore, the study provides empirical evidence supporting the use of feature selection to improve the prediction power of GS.

8.
Sci Rep ; 13(1): 9947, 2023 06 19.
Artigo em Inglês | MEDLINE | ID: mdl-37336933

RESUMO

It is of paramount importance in plant breeding to have methods dealing with large numbers of predictor variables and few sample observations, as well as efficient methods for dealing with high correlation in predictors and measured traits. This paper explores in terms of prediction performance the partial least squares (PLS) method under single-trait (ST) and multi-trait (MT) prediction of potato traits. The first prediction was for tested lines in tested environments under a five-fold cross-validation (5FCV) strategy and the second prediction was for tested lines in untested environments (herein denoted as leave one environment out cross validation, LOEO). There was a good performance in terms of predictions (with accuracy mostly > 0.5 for Pearson's correlation) the accuracy of 5FCV was better than LOEO. Hence, we have empirical evidence that the ST and MT PLS framework is a very valuable tool for prediction in the context of potato breeding data.


Assuntos
Solanum tuberosum , Solanum tuberosum/genética , Análise dos Mínimos Quadrados , Modelos Genéticos , Melhoramento Vegetal , Fenótipo , Genômica/métodos , Genótipo
9.
Genes (Basel) ; 14(5)2023 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-37239363

RESUMO

Genomic selection (GS) is revolutionizing plant breeding. However, because it is a predictive methodology, a basic understanding of statistical machine-learning methods is necessary for its successful implementation. This methodology uses a reference population that contains both the phenotypic and genotypic information of genotypes to train a statistical machine-learning method. After optimization, this method is used to make predictions of candidate lines for which only genotypic information is available. However, due to a lack of time and appropriate training, it is difficult for breeders and scientists of related fields to learn all the fundamentals of prediction algorithms. With smart or highly automated software, it is possible for these professionals to appropriately implement any state-of-the-art statistical machine-learning method for its collected data without the need for an exhaustive understanding of statistical machine-learning methods and programing. For this reason, we introduce state-of-the-art statistical machine-learning methods using the Sparse Kernel Methods (SKM) R library, with complete guidelines on how to implement seven statistical machine-learning methods that are available in this library for genomic prediction (random forest, Bayesian models, support vector machine, gradient boosted machine, generalized linear models, partial least squares, feed-forward artificial neural networks). This guide includes details of the functions required to implement each of the methods, as well as others for easily implementing different tuning strategies, cross-validation strategies, and metrics to evaluate the prediction performance and different summary functions that compute it. A toy dataset illustrates how to implement statistical machine-learning methods and facilitate their use by professionals who do not possess a strong background in machine learning and programing.


Assuntos
Melhoramento Vegetal , Software , Teorema de Bayes , Genômica/métodos , Aprendizado de Máquina
10.
Plant Genome ; 16(2): e20346, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37139645

RESUMO

Genomic selection (GS) proposed by Meuwissen et al. more than 20 years ago, is revolutionizing plant and animal breeding. Although GS has been widely accepted and applied to plant and animal breeding, there are many factors affecting its efficacy. We studied 14 real datasets to respond to the practical question of whether the accuracy of genomic prediction increases when considering genomic as compared with not using genomic. We found across traits, environments, datasets, and metrics, that the average gain in prediction accuracy when genomic information is considered was 26.31%, while only in terms of Pearson's correlation the gain was of 46.1%, while only in terms of normalized root mean squared error the gain was of 6.6%. If the quality of the makers and relatedness of the individuals increase, major gains in prediction accuracy can be obtained, but if these two factors decrease, a lower increase is possible. Finally, our findings reinforce genomic is vital for improving the prediction accuracy and, therefore, the realized genetic gain in genomic assisted plant breeding programs.


Assuntos
Melhoramento Vegetal , Seleção Genética , Animais , Modelos Genéticos , Genoma , Genômica
11.
Genes (Basel) ; 14(4)2023 04 17.
Artigo em Inglês | MEDLINE | ID: mdl-37107685

RESUMO

While sparse testing methods have been proposed by researchers to improve the efficiency of genomic selection (GS) in breeding programs, there are several factors that can hinder this. In this research, we evaluated four methods (M1-M4) for sparse testing allocation of lines to environments under multi-environmental trails for genomic prediction of unobserved lines. The sparse testing methods described in this study are applied in a two-stage analysis to build the genomic training and testing sets in a strategy that allows each location or environment to evaluate only a subset of all genotypes rather than all of them. To ensure a valid implementation, the sparse testing methods presented here require BLUEs (or BLUPs) of the lines to be computed at the first stage using an appropriate experimental design and statistical analyses in each location (or environment). The evaluation of the four cultivar allocation methods to environments of the second stage was done with four data sets (two large and two small) under a multi-trait and uni-trait framework. We found that the multi-trait model produced better genomic prediction (GP) accuracy than the uni-trait model and that methods M3 and M4 were slightly better than methods M1 and M2 for the allocation of lines to environments. Some of the most important findings, however, were that even under a scenario where we used a training-testing relation of 15-85%, the prediction accuracy of the four methods barely decreased. This indicates that genomic sparse testing methods for data sets under these scenarios can save considerable operational and financial resources with only a small loss in precision, which can be shown in our cost-benefit analysis.


Assuntos
Modelos Genéticos , Melhoramento Vegetal , Melhoramento Vegetal/métodos , Genoma de Planta/genética , Fenótipo , Genômica , Produtos Agrícolas/genética
12.
BMC Genomics ; 24(1): 220, 2023 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-37101112

RESUMO

BACKGROUND: Genomic selection (GS) is revolutionizing plant and animal breeding. However, still its practical implementation is challenging since it is affected by many factors that when they are not under control make this methodology not effective. Also, due to the fact that it is formulated as a regression problem in general has low sensitivity to select the best candidate individuals since a top percentage is selected according to a ranking of predicted breeding values. RESULTS: For this reason, in this paper we propose two methods to improve the prediction accuracy of this methodology. One of the methods consist in reformulating the GS (nowadays formulated as a regression problem) methodology as a binary classification problem. The other consists only in a postprocessing step that adjust the threshold used for classification of the lines predicted in its original scale (continues scale) to guarantee similar sensitivity and specificity. The postprocessing method is applied for the resulting predictions after obtaining the predictions using the conventional regression model. Both methods assume that we defined with anticipation a threshold, to divide the training data as top lines and not top lines, and this threshold can be decided in terms of a quantile (for example 80%, 90%, etc.) or as the average (or maximum) of the performance of the checks. In the reformulation method it is required to label as one those lines in the training set that are equal or larger than the specified threshold and as zero otherwise. Then we train a binary classification model with the conventional inputs, but using the binary response variable in place of the continuous response variable. The training of the binary classification should be done to guarantee a more similar sensitivity and specificity, to guarantee a reasonable probability of classification of the top lines. CONCLUSIONS: We evaluated the proposed models in seven data sets and we found that the two proposed methods outperformed by large margin the conventional regression model (by 402.9% in terms of sensitivity, by 110.04% in terms of F1 score and by 70.96% in terms of Kappa coefficient, with the postprocessing methods). However, between the two proposed methods the postprocessing method was better than the reformulation as binary classification model. The simple postprocessing method to improve the accuracy of the conventional genomic regression models avoid the need to reformulate the conventional regression models as binary classification models with similar or better performance, that significantly improve the selection of the top best candidate lines. In general both proposed methods are simple and can easily be adopted for use in practical breeding programs, with the guarantee that will improve significantly the selection of the top best candidates lines.


Assuntos
Melhoramento Vegetal , Seleção Genética , Animais , Genoma , Genômica/métodos , Fenótipo , Modelos Genéticos
13.
G3 (Bethesda) ; 13(5)2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-36869747

RESUMO

While several statistical machine learning methods have been developed and studied for assessing the genomic prediction (GP) accuracy of unobserved phenotypes in plant breeding research, few methods have linked genomics and phenomics (imaging). Deep learning (DL) neural networks have been developed to increase the GP accuracy of unobserved phenotypes while simultaneously accounting for the complexity of genotype-environment interaction (GE); however, unlike conventional GP models, DL has not been investigated for when genomics is linked with phenomics. In this study we used 2 wheat data sets (DS1 and DS2) to compare a novel DL method with conventional GP models. Models fitted for DS1 were GBLUP, gradient boosting machine (GBM), support vector regression (SVR) and the DL method. Results indicated that for 1 year, DL provided better GP accuracy than results obtained by the other models. However, GP accuracy obtained for other years indicated that the GBLUP model was slightly superior to the DL. DS2 is comprised only of genomic data from wheat lines tested for 3 years, 2 environments (drought and irrigated) and 2-4 traits. DS2 results showed that when predicting the irrigated environment with the drought environment, DL had higher accuracy than the GBLUP model in all analyzed traits and years. When predicting drought environment with information on the irrigated environment, the DL model and GBLUP model had similar accuracy. The DL method used in this study is novel and presents a strong degree of generalization as several modules can potentially be incorporated and concatenated to produce an output for a multi-input data structure.


Assuntos
Aprendizado Profundo , Triticum , Triticum/genética , Melhoramento Vegetal/métodos , Modelos Genéticos , Fenótipo , Genômica/métodos , Genótipo
14.
Genes (Basel) ; 14(2)2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36833322

RESUMO

Genomic selection (GS) is a methodology that is revolutionizing plant breeding because it can select candidate genotypes without phenotypic evaluation in the field. However, its practical implementation in hybrid prediction remains challenging since many factors affect its accuracy. The main objective of this study was to research the genomic prediction accuracy of wheat hybrids by adding covariates with the hybrid parental phenotypic information to the model. Four types of different models (MA, MB, MC, and MD) with one covariate (same trait to be predicted) (MA_C, MB_C, MC_C, and MD_C) or several covariates (of the same trait and other correlated traits) (MA_AC, MB_AC, MC_AC, and MD_AC) were studied. We found that the four models with parental information outperformed models without parental information in terms of mean square error by at least 14.1% (MA vs. MA_C), 5.5% (MB vs. MB_C), 51.4% (MC vs. MC_C), and 6.4% (MD vs. MD_C) when parental information of the same trait was used and by at least 13.7% (MA vs. MA_AC), 5.3% (MB vs. MB_AC), 55.1% (MC vs. MC_AC), and 6.0% (MD vs. MD_AC) when parental information of the same trait and other correlated traits were used. Our results also show a large gain in prediction accuracy when covariates were considered using the parental phenotypic information, as opposed to marker information. Finally, our results empirically demonstrate that a significant improvement in prediction accuracy was gained by adding parental phenotypic information as covariates; however, this is expensive since, in many breeding programs, the parental phenotypic information is unavailable.


Assuntos
Modelos Genéticos , Triticum , Triticum/genética , Polimorfismo de Nucleotídeo Único , Melhoramento Vegetal , Fenótipo
15.
Plant Genome ; 16(2): e20305, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36815225

RESUMO

Sparse testing is essential to increase the efficiency of the genomic selection methodology, as the same efficiency (in this case prediction power) can be obtained while using less genotypes evaluated in the fields. For this reason, it is important to evaluate the existing methods for performing the allocation of lines to environments. With this goal, four methods (M1-M4) to allocate lines to environments were evaluated under the context of a multi-trait genomic prediction problem: M1 denotes the allocation of a fraction (subset) of lines in all locations, M2 denotes the allocation of a fraction of lines with some shared lines in locations but not arranged based on the balanced incomplete block design (BIBD) principle, M3 denotes the random allocation of a subset of lines to locations, and M4 denotes the allocation of a subset of lines to locations using the BIBD principle. The evaluation was done using seven real multi-environment data sets common in plant breeding programs. We found that the best method was M4 and the worst was M1, while no important differences were found between M3 and M4. We concluded that M4 and M3 are efficient in the context of sparse testing for multi-trait prediction.


Assuntos
Genoma de Planta , Melhoramento Vegetal , Fenótipo , Genótipo , Genômica
16.
Genes (Basel) ; 13(12)2022 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-36553547

RESUMO

Genomic prediction is revolutionizing plant breeding since candidate genotypes can be selected without the need to measure their trait in the field. When a reference population contains both phenotypic and genotypic information, it is trained by a statistical machine learning method that is subsequently used for making predictions of breeding or phenotypic values of candidate genotypes that were only genotyped. Nevertheless, the successful implementation of the genomic selection (GS) methodology depends on many factors. One key factor is the type of statistical machine learning method used since some are unable to capture nonlinear patterns available in the data. While kernel methods are powerful statistical machine learning algorithms that capture complex nonlinear patterns in the data, their successful implementation strongly depends on the careful tuning process of the involved hyperparameters. As such, in this paper we compare three methods of tuning (manual tuning, grid search, and Bayesian optimization) for the Gaussian kernel under a Bayesian best linear unbiased predictor model. We used six real datasets of wheat (Triticum aestivum L.) to compare the three strategies of tuning. We found that if we want to obtain the major benefits of using Gaussian kernels, it is very important to perform a careful tuning process. The best prediction performance was observed when the tuning process was performed with grid search and Bayesian optimization. However, we did not observe relevant differences between the grid search and Bayesian optimization approach. The observed gains in terms of prediction performance were between 2.1% and 27.8% across the six datasets under study.


Assuntos
Genômica , Melhoramento Vegetal , Teorema de Bayes , Melhoramento Vegetal/métodos , Genômica/métodos , Algoritmos , Fenótipo
17.
Genes (Basel) ; 13(12)2022 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-36553548

RESUMO

While genomic selection (GS) began revolutionizing plant breeding when it was proposed around 20 years ago, its practical implementation is still challenging as many factors affect its accuracy. One such factor is the choice of the statistical machine learning method. For this reason, we explore the tuning process under a multi-trait framework using the Gaussian kernel with a multi-trait Bayesian Best Linear Unbiased Predictor (GBLUP) model. We explored three methods of tuning (manual, grid search and Bayesian optimization) using 5 real datasets of breeding programs. We found that using grid search and Bayesian optimization improve between 1.9 and 6.8% the prediction accuracy regarding of using manual tuning. While the improvement in prediction accuracy in some cases can be marginal, it is very important to carry out the tuning process carefully to improve the accuracy of the GS methodology, even though this entails greater computational resources.


Assuntos
Genômica , Modelos Genéticos , Teorema de Bayes , Fenótipo , Genômica/métodos , Aprendizado de Máquina
18.
Front Genet ; 13: 920689, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36313422

RESUMO

In plant breeding, the need to improve the prediction of future seasons or new locations and/or environments, also denoted as "leave one environment out," is of paramount importance to increase the genetic gain in breeding programs and contribute to food and nutrition security worldwide. Genomic selection (GS) has the potential to increase the accuracy of future seasons or new locations because it is a predictive methodology. However, most statistical machine learning methods used for the task of predicting a new environment or season struggle to produce moderate or high prediction accuracies. For this reason, in this study we explore the use of the partial least squares (PLS) regression methodology for this specific task, and we benchmark its performance with the Bayesian Genomic Best Linear Unbiased Predictor (GBLUP) method. The benchmarking process was done with 14 real datasets. We found that in all datasets the PLS method outperformed the popular GBLUP method by margins between 0% (in the Indica data) and 228.28% (in the Disease data) across traits, environments, and types of predictors. Our results show great empirical evidence of the power of the PLS methodology for the prediction of future seasons or new environments.

19.
Front Genet ; 13: 966775, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36134027

RESUMO

The genomic selection (GS) methodology proposed over 20 years ago by Meuwissen et al. (Genetics, 2001) has revolutionized plant breeding. A predictive methodology that trains statistical machine learning algorithms with phenotypic and genotypic data of a reference population and makes predictions for genotyped candidate lines, GS saves significant resources in the selection of candidate individuals. However, its practical implementation is still challenging when the plant breeder is interested in the prediction of future seasons or new locations and/or environments, which is called the "leave one environment out" issue. Furthermore, because the distributions of the training and testing set do not match, most statistical machine learning methods struggle to produce moderate or reasonable prediction accuracies. For this reason, the main objective of this study was to explore the use of the multi-trait partial least square (MT-PLS) regression methodology for this specific task, benchmarking its performance with the Bayesian Multi-trait Genomic Best Linear Unbiased Predictor (MT-GBLUP) method. The benchmarking process was performed with five actual data sets. We found that in all data sets the MT-PLS method outperformed the popular MT-GBLUP method by 349.8% (under predictor E + G), 484.4% (under predictor E + G + GE; where E denotes environments, G genotypes and GE the genotype by environment interaction) and 15.9% (under predictor G + GE) across traits. Our results provide empirical evidence of the power of the MT-PLS methodology for the prediction of future seasons or new environments. Furthermore, the comparison between single univariate-trait (UT) versus MT for GBLUP and PLS gave an increase in prediction accuracy of MT-GBLUP versus UT-GBLUP, but not for MT-PLS versus UT-PLS.

20.
Biomed Res Int ; 2022: 8310213, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36172489

RESUMO

Huxley's model of simple allometry provides a parsimonious scheme for examining scaling relationships in scientific research, resource management, and species conservation endeavors. Factors including biological error, analysis method, sample size, and overall data quality can undermine the reliability of a fit of Huxley's model. Customary amendments enhance the complexity of the power function-conveyed systematic term while keeping the usual normality-borne error structure. The resulting protocols bear multiple-parameter complex allometry forms that could pose interpretative shortcomings and parameter estimation difficulties, and even being empirically pertinent, they could potentially bear overfitting. A subsequent heavy-tailed Q-Q normal spread often remains undetected since the adequacy of a normally distributed error term remains unexplored. Previously, we promoted the advantages of keeping Huxley's model-driven systematic part while switching to a logistically distributed error term to improve fit quality. Here, we analyzed eelgrass leaf biomass and area data exhibiting a marked size-related heterogeneity, perhaps explaining a lack of systematization at data gathering. Overdispersion precluded adequacy of the logistically adapted protocol, thereby suggesting processing data through a median absolute deviation scheme aimed to remove unduly replicates. Nevertheless, achieving regularity to Huxley's power function-like trend required the removal of many replicates, thereby questioning the integrity of a data cleaning approach. But, we managed to adapt the complexity of the error term to reliably identify Huxley's model-like systematic part masked by variability in data. Achieving this relied on an error term conforming to a normal mixture distribution which successfully managed overdispersion in data. Compared to normal-complex allometry and data cleaning composites present arrangement delivered a coherent Q-Q normal mixture spread and a remarkable reproducibility strength of derived proxies. By keeping the analysis within Huxley's original theory, the present approach enables substantiating nondestructive allometric proxies aimed at eelgrass conservation. The viewpoint endorsed here could also make data cleaning unnecessary.


Assuntos
Modelos Biológicos , Biomassa , Distribuição Normal , Folhas de Planta , Reprodutibilidade dos Testes , Tamanho da Amostra
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...