Pesquisa | Portal Regional da BVS

1.

Enhancing winter wheat prediction with genomics, phenomics and environmental data.

Montesinos-López, Osval A; Herr, Andrew W; Crossa, José; Montesinos-López, Abelardo; Carter, Arron H.

BMC Genomics ; 25(1): 544, 2024 May 31.

Artigo em Inglês | MEDLINE | ID: mdl-38822262

RESUMO

In the realm of multi-environment prediction, when the goal is to predict a complete environment using the others as a training set, the efficiency of genomic selection (GS) falls short of expectations. Genotype by environment interaction poses a challenge in achieving high prediction accuracies. Consequently, current efforts are focused on enhancing efficiency by integrating various types of inputs, such as phenomics data, environmental information, and other omics data. In this study, we sought to evaluate the impact of incorporating environmental information into the modeling process, in addition to genomic and phenomics information. Our evaluation encompassed five data sets of soft white winter wheat, and the results revealed a significant improvement in prediction accuracy, as measured by the normalized root mean square error (NRMSE), through the integration of environmental information. Notably, there was an average gain in prediction accuracy of 49.19% in terms of NRMSE across the data sets. Moreover, the observed prediction accuracy ranged from 5.68% (data set 3) to 60.36% (data set 4), underscoring the substantial effect of integrating environmental information. By including genomic, phenomic, and environmental data in prediction models, plant breeding programs can improve selection efficiency across locations.

Assuntos

Genômica , Fenômica , Triticum , Triticum/genética , Genômica/métodos , Interação Gene-Ambiente , Fenótipo , Genótipo , Melhoramento Vegetal , Meio Ambiente , Genoma de Planta

2.

Feature engineering of environmental covariates improves plant genomic-enabled prediction.

Montesinos-López, Osval A; Crespo-Herrera, Leonardo; Pierre, Carolina Saint; Cano-Paez, Bernabe; Huerta-Prado, Gloria Isabel; Mosqueda-González, Brandon Alejandro; Ramos-Pulido, Sofia; Gerard, Guillermo; Alnowibet, Khalid; Fritsche-Neto, Roberto; Montesinos-López, Abelardo; Crossa, José.

Front Plant Sci ; 15: 1349569, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38812738

RESUMO

Introduction: Because Genomic selection (GS) is a predictive methodology, it needs to guarantee high-prediction accuracies for practical implementations. However, since many factors affect the prediction performance of this methodology, its practical implementation still needs to be improved in many breeding programs. For this reason, many strategies have been explored to improve the prediction performance of this methodology. Methods: When environmental covariates are incorporated as inputs in the genomic prediction models, this information only sometimes helps increase prediction performance. For this reason, this investigation explores the use of feature engineering on the environmental covariates to enhance the prediction performance of genomic prediction models. Results and discussion: We found that across data sets, feature engineering helps reduce prediction error regarding only the inclusion of the environmental covariates without feature engineering by 761.625% across predictors. These results are very promising regarding the potential of feature engineering to enhance prediction accuracy. However, since a significant gain in prediction accuracy was observed in only some data sets, further research is required to guarantee a robust feature engineering strategy to incorporate the environmental covariates.

3.

Multispectral and thermal infrared data, visual scores for severity of common rust symptoms, and genotypic single nucleotide polymorphism data of three F2-derived biparental doubled-haploid maize populations.

Loladze, Alexander; Rodrigues, Francelino; Petroli, Cesar D; Muñoz-Zavala, Carlos; Naranjo, Sergio; Vicente, Felix San; Gerard, Bruno; Montesinos-Lopez, Osval A; Crossa, Jose; Martini, Johannes W R.

Data Brief ; 54: 110300, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38586147

RESUMO

Three F2-derived biparental doubled haploid (DH) maize populations were generated for genetic mapping of resistance to common rust. Each of the three populations has the same susceptible parent, but a different resistance donor parent. Population 1 and 3 consist of 320 lines each, population 2 consists of 260 lines. The DH lines were evaluated for their susceptibility to common rust in two years and with two replications in each year. For phenotyping, a visual score (VS) for susceptibility was assigned. Additionally, unmanned aerial vehicle (UAV) derived multispectral and thermal infrared data was recorded and combined in different vegetation indices ("remote sensing", RS). The DH lines were genotyped with the DarTseq method, to obtain data on single nucleotide polymorphisms (SNPs). After quality control, 9051 markers remained. Missing values were "imputed" by the empirical mean of the marker scores of the respective locus. We used the data for comparison of genome-wide association studies and genomic prediction when based on different phenotyping methods, that is either VS or RS data. The data may be interesting for reuse for instance for benchmarking genomic prediction models, for phytopathological studies addressing common rust, or for specifications of vegetation indices.

4.

Deep learning methods improve genomic prediction of wheat breeding.

Montesinos-López, Abelardo; Crespo-Herrera, Leonardo; Dreisigacker, Susanna; Gerard, Guillermo; Vitale, Paolo; Saint Pierre, Carolina; Govindan, Velu; Tarekegn, Zerihun Tadesse; Flores, Moisés Chavira; Pérez-Rodríguez, Paulino; Ramos-Pulido, Sofía; Lillemo, Morten; Li, Huihui; Montesinos-López, Osval A; Crossa, Jose.

Front Plant Sci ; 15: 1324090, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38504889

RESUMO

In the field of plant breeding, various machine learning models have been developed and studied to evaluate the genomic prediction (GP) accuracy of unseen phenotypes. Deep learning has shown promise. However, most studies on deep learning in plant breeding have been limited to small datasets, and only a few have explored its application in moderate-sized datasets. In this study, we aimed to address this limitation by utilizing a moderately large dataset. We examined the performance of a deep learning (DL) model and compared it with the widely used and powerful best linear unbiased prediction (GBLUP) model. The goal was to assess the GP accuracy in the context of a five-fold cross-validation strategy and when predicting complete environments using the DL model. The results revealed the DL model outperformed the GBLUP model in terms of GP accuracy for two out of the five included traits in the five-fold cross-validation strategy, with similar results in the other traits. This indicates the superiority of the DL model in predicting these specific traits. Furthermore, when predicting complete environments using the leave-one-environment-out (LOEO) approach, the DL model demonstrated competitive performance. It is worth noting that the DL model employed in this study extends a previously proposed multi-modal DL model, which had been primarily applied to image data but with small datasets. By utilizing a moderately large dataset, we were able to evaluate the performance and potential of the DL model in a context with more information and challenging scenario in plant breeding.

5.

Use of remote sensing for linkage mapping and genomic prediction for common rust resistance in maize.

Loladze, Alexander; Rodrigues, Francelino A; Petroli, Cesar D; Muñoz-Zavala, Carlos; Naranjo, Sergio; San Vicente, Felix; Gerard, Bruno; Montesinos-Lopez, Osval A; Crossa, Jose; Martini, Johannes W R.

Field Crops Res ; 308: 109281, 2024 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-38495466

RESUMO

Breeding for disease resistance is a central component of strategies implemented to mitigate biotic stress impacts on crop yield. Conventionally, genotypes of a plant population are evaluated through a labor-intensive process of assigning visual scores (VS) of susceptibility (or resistance) by specifically trained staff, which limits manageable volumes and repeatability of evaluation trials. Remote sensing (RS) tools have the potential to streamline phenotyping processes and to deliver more standardized results at higher through-put. Here, we use a two-year evaluation trial of three newly developed biparental populations of maize doubled haploid lines (DH) to compare the results of genomic analyses of resistance to common rust (CR) when phenotyping is either based on conventional VS or on RS-derived (vegetation) indices. As a general observation, for each population × year combination, the broad sense heritability of VS was greater than or very close to the maximum heritability across all RS indices. Moreover, results of linkage mapping as well as of genomic prediction (GP), suggest that VS data was of a higher quality, indicated by higher -logp values in the linkage studies and higher predictive abilities for genomic prediction. Nevertheless, despite the qualitative differences between the phenotyping methods, each successfully identified the same genomic region on chromosome 10 as being associated with disease resistance. This region is likely related to the known CR resistance locus Rp1. Our results indicate that RS technology can be used to streamline genetic evaluation processes for foliar disease resistance in maize. In particular, RS can potentially reduce costs of phenotypic evaluations and increase trialing capacities.

6.

Genomic selection in plant breeding: Key factors shaping two decades of progress.

Alemu, Admas; Åstrand, Johanna; Montesinos-López, Osval A; Isidro Y Sánchez, Julio; Fernández-Gónzalez, Javier; Tadesse, Wuletaw; Vetukuri, Ramesh R; Carlsson, Anders S; Ceplitis, Alf; Crossa, José; Ortiz, Rodomiro; Chawade, Aakash.

Mol Plant ; 17(4): 552-578, 2024 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-38475993

RESUMO

Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.

Assuntos

Genoma de Planta , Melhoramento Vegetal , Humanos , Genoma de Planta/genética , Seleção Genética , Genômica , Fenótipo , Genótipo , Plantas , Polimorfismo de Nucleotídeo Único/genética

7.

Data Augmentation Enhances Plant-Genomic-Enabled Predictions.

Montesinos-López, Osval A; Solis-Camacho, Mario Alberto; Crespo-Herrera, Leonardo; Saint Pierre, Carolina; Huerta Prado, Gloria Isabel; Ramos-Pulido, Sofia; Al-Nowibet, Khalid; Fritsche-Neto, Roberto; Gerard, Guillermo; Montesinos-López, Abelardo; Crossa, José.

Genes (Basel) ; 15(3)2024 02 24.

Artigo em Inglês | MEDLINE | ID: mdl-38540344

RESUMO

Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data augmentation (DA) generate synthetic data from the original training set to increase the training set and to improve the prediction performance of any statistical or machine learning algorithm. There is much empirical evidence of their success in many computer vision applications. Due to this, DA was explored in the context of GS using 14 real datasets. We found empirical evidence that DA is a powerful tool to improve the prediction accuracy, since we improved the prediction accuracy of the top lines in the 14 datasets under study. On average, across datasets and traits, the gain in prediction performance of the DA approach regarding the Conventional method in the top 20% of lines in the testing set was 108.4% in terms of the NRMSE and 107.4% in terms of the MAAPE, but a worse performance was observed on the whole testing set. We encourage more empirical evaluations to support our findings.

Assuntos

Genoma de Planta , Genômica , Fenótipo , Aprendizado de Máquina , Redes Neurais de Computação

8.

Bayesian discrete lognormal regression model for genomic prediction.

Montesinos-López, Abelardo; Gutiérrez-Pulido, Humberto; Ramos-Pulido, Sofía; Montesinos-López, José Cricelio; Montesinos-López, Osval A; Crossa, José.

Theor Appl Genet ; 137(1): 21, 2024 Jan 14.

Artigo em Inglês | MEDLINE | ID: mdl-38221602

RESUMO

KEY MESSAGE: Genomic prediction models for quantitative traits assume continuous and normally distributed phenotypes. In this research, we proposed a novel Bayesian discrete lognormal regression model. Genomic selection is a powerful tool in modern breeding programs that uses genomic information to predict the performance of individuals and select those with desirable traits. It has revolutionized animal and plant breeding, as it allows breeders to identify the best candidates without labor-intensive and time-consuming phenotypic evaluations. While several statistical models have been developed, most of them have been for quantitative continuous traits and only a few for count responses. In this paper, we propose a discrete lognormal regression model in the Bayesian context, that with a Gibbs sampler to explore the corresponding posterior distribution and make the predictions. Two datasets of resistance disease is used in the wheat crop and are then evaluated against the traditional Gaussian model and a lognormal model. The results indicate the proposed model is a competitive and natural model for predicting count genomic traits.

Assuntos

Modelos Genéticos , Melhoramento Vegetal , Humanos , Animais , Teorema de Bayes , Genoma , Genômica/métodos , Fenótipo

9.

Multispectral-derived genotypic similarities from budget cameras allow grain yield prediction and genomic selection augmentation in single and multi-environment scenarios in spring wheat.

Mróz, Tomasz; Shafiee, Sahameh; Crossa, Jose; Montesinos-Lopez, Osval A; Lillemo, Morten.

Mol Breed ; 44(1): 5, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38230361

RESUMO

With abundant available genomic data, genomic selection has become routine in many plant breeding programs. Multispectral data captured by UAVs showed potential for grain yield (GY) prediction in many plant species using machine learning; however, the possibilities of utilizing this data to augment genomic prediction models still need to be explored. We collected high-throughput phenotyping (HTP) multispectral data in a genotyped multi-environment large-scale field trial using two cost-effective cameras to fill this gap. We tested back to back the prediction ability of GY prediction models, including genomic (G matrix), multispectral-derived (M matrix), and environmental (E matrix) relationships using best linear unbiased predictor (BLUP) methodology in single and multi-environment scenarios. We discovered that M allows for GY prediction comparable to the G matrix and that models using both G and M matrices show superior accuracies and errors compared with G or M alone, both in single and multi-environment scenarios. We showed that the M matrix is not entirely environment-specific, and the genotypic relationships become more robust with more data capture sessions over the season. We discovered that the optimal time for data capture occurs during grain filling and that camera bands with the highest heritability are important for GY prediction using the M matrix. We showcased that GY prediction can be performed using only an RGB camera, and even a single data capture session can yield valuable data for GY prediction. This study contributes to a better understanding of multispectral data and its relationships. It provides a flexible framework for improving GS protocols without significant investments or software customization. Supplementary Information: The online version contains supplementary material available at 10.1007/s11032-024-01449-w.

10.

A marker weighting approach for enhancing within-family accuracy in genomic prediction.

Montesinos-López, Osval A; Crespo-Herrera, Leonardo; Xavier, Alencar; Godwa, Manje; Beyene, Yoseph; Pierre, Carolina Saint; de la Rosa-Santamaria, Roberto; Salinas-Ruiz, Josafhat; Gerard, Guillermo; Vitale, Paolo; Dreisigacker, Susanne; Lillemo, Morten; Grignola, Fernando; Sarinelli, Martin; Pozzo, Ezequiel; Quiroga, Marco; Montesinos-López, Abelardo; Crossa, José.

G3 (Bethesda) ; 14(2)2024 Feb 07.

Artigo em Inglês | MEDLINE | ID: mdl-38079160

RESUMO

Genomic selection is revolutionizing plant breeding. However, its practical implementation is still very challenging, since predicted values do not necessarily have high correspondence to the observed phenotypic values. When the goal is to predict within-family, it is not always possible to obtain reasonable accuracies, which is of paramount importance to improve the selection process. For this reason, in this research, we propose the Adversaria-Boruta (AB) method, which combines the virtues of the adversarial validation (AV) method and the Boruta feature selection method. The AB method operates primarily by minimizing the disparity between training and testing distributions. This is accomplished by reducing the weight assigned to markers that display the most significant differences between the training and testing sets. Therefore, the AB method built a weighted genomic relationship matrix that is implemented with the genomic best linear unbiased predictor (GBLUP) model. The proposed AB method is compared using 12 real data sets with the GBLUP model that uses a nonweighted genomic relationship matrix. Our results show that the proposed AB method outperforms the GBLUP by 8.6, 19.7, and 9.8% in terms of Pearson's correlation, mean square error, and normalized root mean square error, respectively. Our results support that the proposed AB method is a useful tool to improve the prediction accuracy of a complete family, however, we encourage other investigators to evaluate the AB method to increase the empirical evidence of its potential.

Assuntos

Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Genoma , Genômica/métodos , Modelos Lineares , Fenótipo , Genótipo

11.

Multivariate Genomic Hybrid Prediction with Kernels and Parental Information.

Montesinos-López, Osval A; Crossa, José; Saint Pierre, Carolina; Gerard, Guillermo; Valenzo-Jiménez, Marco Alberto; Vitale, Paolo; Valladares-Cellis, Patricia Edwigis; Buenrostro-Mariscal, Raymundo; Montesinos-López, Abelardo; Crespo-Herrera, Leonardo.

Int J Mol Sci ; 24(18)2023 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-37762107

RESUMO

Genomic selection (GS) plays a pivotal role in hybrid prediction. It can enhance the selection of parental lines, accurately predict hybrid performance, and harness hybrid vigor. Likewise, it can optimize breeding strategies by reducing field trial requirements, expediting hybrid development, facilitating targeted trait improvement, and enhancing adaptability to diverse environments. Leveraging genomic information empowers breeders to make informed decisions and significantly improve the efficiency and success rate of hybrid breeding programs. In order to improve the genomic ability performance, we explored the incorporation of parental phenotypic information as covariates under a multi-trait framework. Approach 1, referred to as Pmean, directly utilized parental phenotypic information without any preprocessing. While approach 2, denoted as BV, replaced the direct use of phenotypic values of both parents with their respective breeding values. While an improvement in prediction performance was observed in both approaches, with a minimum 4.24% reduction in the normalized root mean square error (NRMSE), the direct incorporation of parental phenotypic information in the Pmean approach slightly outperformed the BV approach. We also compared these two approaches using linear and nonlinear kernels, but no relevant gain was observed. Finally, our results increase empirical evidence confirming that the integration of parental phenotypic information helps increase the prediction performance of hybrids.

Assuntos

Hibridização Genética , Modelos Genéticos , Genoma de Planta , Fenótipo , Genômica/métodos , Melhoramento Vegetal

12.

A novel method for genomic-enabled prediction of cultivars in new environments.

Montesinos-López, Osval A; Ramos-Pulido, Sofia; Hernández-Suárez, Carlos Moisés; Mosqueda González, Brandon Alejandro; Valladares-Anguiano, Felícitas Alejandra; Vitale, Paolo; Montesinos-López, Abelardo; Crossa, José.

Front Plant Sci ; 14: 1218151, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37564390

RESUMO

Introduction: Genomic selection (GS) has gained global importance due to its potential to accelerate genetic progress and improve the efficiency of breeding programs. Objectives of the research: In this research we proposed a method to improve the prediction accuracy of tested lines in new (untested) environments. Method-1: The new method trained the model with a modified response variable (a difference of response variables) that decreases the lack of a non-stationary distribution between the training and testing and improved the prediction accuracy. Comparing new and conventional method: We compared the prediction accuracy of the conventional genomic best linear unbiased prediction (GBLUP) model (M1) including (or not) genotype × environment interaction (GE) (M1_GE; M1_NO_GE) versus the proposed method (M2) on several data sets. Results and discussion: The gain in prediction accuracy of M2, versus M1_GE, M1_NO_GE in terms of Pearson´s correlation was of at least 4.3%, while in terms of percentage of top-yielding lines captured when was selected the 10% (Best10) and 20% (Best20) of lines was at least of 19.5%, while in terms of Normalized Root Mean Squared Error (NRMSE) was of at least of 42.29%.

13.

Do feature selection methods for selecting environmental covariables enhance genomic prediction accuracy?

Montesinos-López, Osval A; Crespo-Herrera, Leonardo; Saint Pierre, Carolina; Bentley, Alison R; de la Rosa-Santamaria, Roberto; Ascencio-Laguna, José Alejandro; Agbona, Afolabi; Gerard, Guillermo S; Montesinos-López, Abelardo; Crossa, José.

Front Genet ; 14: 1209275, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37554404

RESUMO

Genomic selection (GS) is transforming plant and animal breeding, but its practical implementation for complex traits and multi-environmental trials remains challenging. To address this issue, this study investigates the integration of environmental information with genotypic information in GS. The study proposes the use of two feature selection methods (Pearson's correlation and Boruta) for the integration of environmental information. Results indicate that the simple incorporation of environmental covariates may increase or decrease prediction accuracy depending on the case. However, optimal incorporation of environmental covariates using feature selection significantly improves prediction accuracy in four out of six datasets between 14.25% and 218.71% under a leave one environment out cross validation scenario in terms of Normalized Root Mean Squared Error, but not relevant gain was observed in terms of Pearson´s correlation. In two datasets where environmental covariates are unrelated to the response variable, feature selection is unable to enhance prediction accuracy. Therefore, the study provides empirical evidence supporting the use of feature selection to improve the prediction power of GS.

14.

Partial least squares enhance multi-trait genomic prediction of potato cultivars in new environments.

Ortiz, Rodomiro; Reslow, Fredrik; Montesinos-López, Abelardo; Huicho, José; Pérez-Rodríguez, Paulino; Montesinos-López, Osval A; Crossa, José.

Sci Rep ; 13(1): 9947, 2023 06 19.

Artigo em Inglês | MEDLINE | ID: mdl-37336933

RESUMO

It is of paramount importance in plant breeding to have methods dealing with large numbers of predictor variables and few sample observations, as well as efficient methods for dealing with high correlation in predictors and measured traits. This paper explores in terms of prediction performance the partial least squares (PLS) method under single-trait (ST) and multi-trait (MT) prediction of potato traits. The first prediction was for tested lines in tested environments under a five-fold cross-validation (5FCV) strategy and the second prediction was for tested lines in untested environments (herein denoted as leave one environment out cross validation, LOEO). There was a good performance in terms of predictions (with accuracy mostly > 0.5 for Pearson's correlation) the accuracy of 5FCV was better than LOEO. Hence, we have empirical evidence that the ST and MT PLS framework is a very valuable tool for prediction in the context of potato breeding data.

Assuntos

Solanum tuberosum , Solanum tuberosum/genética , Análise dos Mínimos Quadrados , Modelos Genéticos , Melhoramento Vegetal , Fenótipo , Genômica/métodos , Genótipo

15.

Statistical Machine-Learning Methods for Genomic Prediction Using the SKM Library.

Montesinos López, Osval A; Mosqueda González, Brandon Alejandro; Montesinos López, Abelardo; Crossa, José.

Genes (Basel) ; 14(5)2023 04 28.

Artigo em Inglês | MEDLINE | ID: mdl-37239363

RESUMO

Genomic selection (GS) is revolutionizing plant breeding. However, because it is a predictive methodology, a basic understanding of statistical machine-learning methods is necessary for its successful implementation. This methodology uses a reference population that contains both the phenotypic and genotypic information of genotypes to train a statistical machine-learning method. After optimization, this method is used to make predictions of candidate lines for which only genotypic information is available. However, due to a lack of time and appropriate training, it is difficult for breeders and scientists of related fields to learn all the fundamentals of prediction algorithms. With smart or highly automated software, it is possible for these professionals to appropriately implement any state-of-the-art statistical machine-learning method for its collected data without the need for an exhaustive understanding of statistical machine-learning methods and programing. For this reason, we introduce state-of-the-art statistical machine-learning methods using the Sparse Kernel Methods (SKM) R library, with complete guidelines on how to implement seven statistical machine-learning methods that are available in this library for genomic prediction (random forest, Bayesian models, support vector machine, gradient boosted machine, generalized linear models, partial least squares, feed-forward artificial neural networks). This guide includes details of the functions required to implement each of the methods, as well as others for easily implementing different tuning strategies, cross-validation strategies, and metrics to evaluate the prediction performance and different summary functions that compute it. A toy dataset illustrates how to implement statistical machine-learning methods and facilitate their use by professionals who do not possess a strong background in machine learning and programing.

Assuntos

Melhoramento Vegetal , Software , Teorema de Bayes , Genômica/métodos , Aprendizado de Máquina

16.

Efficacy of plant breeding using genomic information.

Montesinos-López, Osval A; Bentley, Alison R; Saint Pierre, Carolina; Crespo-Herrera, Leonardo; Rebollar-Ruellas, Leonardo; Valladares-Celis, Patricia Edwigis; Lillemo, Morten; Montesinos-López, Abelardo; Crossa, José.

Plant Genome ; 16(2): e20346, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37139645

RESUMO

Genomic selection (GS) proposed by Meuwissen et al. more than 20 years ago, is revolutionizing plant and animal breeding. Although GS has been widely accepted and applied to plant and animal breeding, there are many factors affecting its efficacy. We studied 14 real datasets to respond to the practical question of whether the accuracy of genomic prediction increases when considering genomic as compared with not using genomic. We found across traits, environments, datasets, and metrics, that the average gain in prediction accuracy when genomic information is considered was 26.31%, while only in terms of Pearson's correlation the gain was of 46.1%, while only in terms of normalized root mean squared error the gain was of 6.6%. If the quality of the makers and relatedness of the individuals increase, major gains in prediction accuracy can be obtained, but if these two factors decrease, a lower increase is possible. Finally, our findings reinforce genomic is vital for improving the prediction accuracy and, therefore, the realized genetic gain in genomic assisted plant breeding programs.

Assuntos

Melhoramento Vegetal , Seleção Genética , Animais , Modelos Genéticos , Genoma , Genômica

17.

Genomics combined with UAS data enhances prediction of grain yield in winter wheat.

Montesinos-López, Osval A; Herr, Andrew W; Crossa, José; Carter, Arron H.

Front Genet ; 14: 1124218, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37065497

RESUMO

With the human population continuing to increase worldwide, there is pressure to employ novel technologies to increase genetic gain in plant breeding programs that contribute to nutrition and food security. Genomic selection (GS) has the potential to increase genetic gain because it can accelerate the breeding cycle, increase the accuracy of estimated breeding values, and improve selection accuracy. However, with recent advances in high throughput phenotyping in plant breeding programs, the opportunity to integrate genomic and phenotypic data to increase prediction accuracy is present. In this paper, we applied GS to winter wheat data integrating two types of inputs: genomic and phenotypic. We observed the best accuracy of grain yield when combining both genomic and phenotypic inputs, while only using genomic information fared poorly. In general, the predictions with only phenotypic information were very competitive to using both sources of information, and in many cases using only phenotypic information provided the best accuracy. Our results are encouraging because it is clear we can enhance the prediction accuracy of GS by integrating high quality phenotypic inputs in the models.

18.

Optimizing Sparse Testing for Genomic Prediction of Plant Breeding Crops.

Montesinos-López, Osval A; Saint Pierre, Carolina; Gezan, Salvador A; Bentley, Alison R; Mosqueda-González, Brandon A; Montesinos-López, Abelardo; van Eeuwijk, Fred; Beyene, Yoseph; Gowda, Manje; Gardner, Keith; Gerard, Guillermo S; Crespo-Herrera, Leonardo; Crossa, José.

Genes (Basel) ; 14(4)2023 04 17.

Artigo em Inglês | MEDLINE | ID: mdl-37107685

RESUMO

While sparse testing methods have been proposed by researchers to improve the efficiency of genomic selection (GS) in breeding programs, there are several factors that can hinder this. In this research, we evaluated four methods (M1-M4) for sparse testing allocation of lines to environments under multi-environmental trails for genomic prediction of unobserved lines. The sparse testing methods described in this study are applied in a two-stage analysis to build the genomic training and testing sets in a strategy that allows each location or environment to evaluate only a subset of all genotypes rather than all of them. To ensure a valid implementation, the sparse testing methods presented here require BLUEs (or BLUPs) of the lines to be computed at the first stage using an appropriate experimental design and statistical analyses in each location (or environment). The evaluation of the four cultivar allocation methods to environments of the second stage was done with four data sets (two large and two small) under a multi-trait and uni-trait framework. We found that the multi-trait model produced better genomic prediction (GP) accuracy than the uni-trait model and that methods M3 and M4 were slightly better than methods M1 and M2 for the allocation of lines to environments. Some of the most important findings, however, were that even under a scenario where we used a training-testing relation of 15-85%, the prediction accuracy of the four methods barely decreased. This indicates that genomic sparse testing methods for data sets under these scenarios can save considerable operational and financial resources with only a small loss in precision, which can be shown in our cost-benefit analysis.

Assuntos

Modelos Genéticos , Melhoramento Vegetal , Melhoramento Vegetal/métodos , Genoma de Planta/genética , Fenótipo , Genômica , Produtos Agrícolas/genética

19.

Two simple methods to improve the accuracy of the genomic selection methodology.

Montesinos-López, Osval A; Montesinos-López, Abelardo.

BMC Genomics ; 24(1): 220, 2023 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-37101112

RESUMO

BACKGROUND: Genomic selection (GS) is revolutionizing plant and animal breeding. However, still its practical implementation is challenging since it is affected by many factors that when they are not under control make this methodology not effective. Also, due to the fact that it is formulated as a regression problem in general has low sensitivity to select the best candidate individuals since a top percentage is selected according to a ranking of predicted breeding values. RESULTS: For this reason, in this paper we propose two methods to improve the prediction accuracy of this methodology. One of the methods consist in reformulating the GS (nowadays formulated as a regression problem) methodology as a binary classification problem. The other consists only in a postprocessing step that adjust the threshold used for classification of the lines predicted in its original scale (continues scale) to guarantee similar sensitivity and specificity. The postprocessing method is applied for the resulting predictions after obtaining the predictions using the conventional regression model. Both methods assume that we defined with anticipation a threshold, to divide the training data as top lines and not top lines, and this threshold can be decided in terms of a quantile (for example 80%, 90%, etc.) or as the average (or maximum) of the performance of the checks. In the reformulation method it is required to label as one those lines in the training set that are equal or larger than the specified threshold and as zero otherwise. Then we train a binary classification model with the conventional inputs, but using the binary response variable in place of the continuous response variable. The training of the binary classification should be done to guarantee a more similar sensitivity and specificity, to guarantee a reasonable probability of classification of the top lines. CONCLUSIONS: We evaluated the proposed models in seven data sets and we found that the two proposed methods outperformed by large margin the conventional regression model (by 402.9% in terms of sensitivity, by 110.04% in terms of F1 score and by 70.96% in terms of Kappa coefficient, with the postprocessing methods). However, between the two proposed methods the postprocessing method was better than the reformulation as binary classification model. The simple postprocessing method to improve the accuracy of the conventional genomic regression models avoid the need to reformulate the conventional regression models as binary classification models with similar or better performance, that significantly improve the selection of the top best candidate lines. In general both proposed methods are simple and can easily be adopted for use in practical breeding programs, with the guarantee that will improve significantly the selection of the top best candidates lines.

Assuntos

Melhoramento Vegetal , Seleção Genética , Animais , Genoma , Genômica/métodos , Fenótipo , Modelos Genéticos

20.

Multimodal deep learning methods enhance genomic prediction of wheat breeding.

Montesinos-López, Abelardo; Rivera, Carolina; Pinto, Francisco; Piñera, Francisco; Gonzalez, David; Reynolds, Mathew; Pérez-Rodríguez, Paulino; Li, Huihui; Montesinos-López, Osval A; Crossa, Jose.

G3 (Bethesda) ; 13(5)2023 05 02.

Artigo em Inglês | MEDLINE | ID: mdl-36869747

RESUMO

While several statistical machine learning methods have been developed and studied for assessing the genomic prediction (GP) accuracy of unobserved phenotypes in plant breeding research, few methods have linked genomics and phenomics (imaging). Deep learning (DL) neural networks have been developed to increase the GP accuracy of unobserved phenotypes while simultaneously accounting for the complexity of genotype-environment interaction (GE); however, unlike conventional GP models, DL has not been investigated for when genomics is linked with phenomics. In this study we used 2 wheat data sets (DS1 and DS2) to compare a novel DL method with conventional GP models. Models fitted for DS1 were GBLUP, gradient boosting machine (GBM), support vector regression (SVR) and the DL method. Results indicated that for 1 year, DL provided better GP accuracy than results obtained by the other models. However, GP accuracy obtained for other years indicated that the GBLUP model was slightly superior to the DL. DS2 is comprised only of genomic data from wheat lines tested for 3 years, 2 environments (drought and irrigated) and 2-4 traits. DS2 results showed that when predicting the irrigated environment with the drought environment, DL had higher accuracy than the GBLUP model in all analyzed traits and years. When predicting drought environment with information on the irrigated environment, the DL model and GBLUP model had similar accuracy. The DL method used in this study is novel and presents a strong degree of generalization as several modules can potentially be incorporated and concatenated to produce an output for a multi-input data structure.

Assuntos

Aprendizado Profundo , Triticum , Triticum/genética , Melhoramento Vegetal/métodos , Modelos Genéticos , Fenótipo , Genômica/métodos , Genótipo

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA