Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 65
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
BMC Genomics ; 25(1): 544, 2024 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-38822262

RESUMEN

In the realm of multi-environment prediction, when the goal is to predict a complete environment using the others as a training set, the efficiency of genomic selection (GS) falls short of expectations. Genotype by environment interaction poses a challenge in achieving high prediction accuracies. Consequently, current efforts are focused on enhancing efficiency by integrating various types of inputs, such as phenomics data, environmental information, and other omics data. In this study, we sought to evaluate the impact of incorporating environmental information into the modeling process, in addition to genomic and phenomics information. Our evaluation encompassed five data sets of soft white winter wheat, and the results revealed a significant improvement in prediction accuracy, as measured by the normalized root mean square error (NRMSE), through the integration of environmental information. Notably, there was an average gain in prediction accuracy of 49.19% in terms of NRMSE across the data sets. Moreover, the observed prediction accuracy ranged from 5.68% (data set 3) to 60.36% (data set 4), underscoring the substantial effect of integrating environmental information. By including genomic, phenomic, and environmental data in prediction models, plant breeding programs can improve selection efficiency across locations.


Asunto(s)
Genómica , Fenómica , Triticum , Triticum/genética , Genómica/métodos , Interacción Gen-Ambiente , Fenotipo , Genotipo , Fitomejoramiento , Ambiente , Genoma de Planta
2.
Theor Appl Genet ; 137(1): 21, 2024 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-38221602

RESUMEN

KEY MESSAGE: Genomic prediction models for quantitative traits assume continuous and normally distributed phenotypes. In this research, we proposed a novel Bayesian discrete lognormal regression model. Genomic selection is a powerful tool in modern breeding programs that uses genomic information to predict the performance of individuals and select those with desirable traits. It has revolutionized animal and plant breeding, as it allows breeders to identify the best candidates without labor-intensive and time-consuming phenotypic evaluations. While several statistical models have been developed, most of them have been for quantitative continuous traits and only a few for count responses. In this paper, we propose a discrete lognormal regression model in the Bayesian context, that with a Gibbs sampler to explore the corresponding posterior distribution and make the predictions. Two datasets of resistance disease is used in the wheat crop and are then evaluated against the traditional Gaussian model and a lognormal model. The results indicate the proposed model is a competitive and natural model for predicting count genomic traits.


Asunto(s)
Modelos Genéticos , Fitomejoramiento , Humanos , Animales , Teorema de Bayes , Genoma , Genómica/métodos , Fenotipo
3.
Mol Breed ; 44(1): 5, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38230361

RESUMEN

With abundant available genomic data, genomic selection has become routine in many plant breeding programs. Multispectral data captured by UAVs showed potential for grain yield (GY) prediction in many plant species using machine learning; however, the possibilities of utilizing this data to augment genomic prediction models still need to be explored. We collected high-throughput phenotyping (HTP) multispectral data in a genotyped multi-environment large-scale field trial using two cost-effective cameras to fill this gap. We tested back to back the prediction ability of GY prediction models, including genomic (G matrix), multispectral-derived (M matrix), and environmental (E matrix) relationships using best linear unbiased predictor (BLUP) methodology in single and multi-environment scenarios. We discovered that M allows for GY prediction comparable to the G matrix and that models using both G and M matrices show superior accuracies and errors compared with G or M alone, both in single and multi-environment scenarios. We showed that the M matrix is not entirely environment-specific, and the genotypic relationships become more robust with more data capture sessions over the season. We discovered that the optimal time for data capture occurs during grain filling and that camera bands with the highest heritability are important for GY prediction using the M matrix. We showcased that GY prediction can be performed using only an RGB camera, and even a single data capture session can yield valuable data for GY prediction. This study contributes to a better understanding of multispectral data and its relationships. It provides a flexible framework for improving GS protocols without significant investments or software customization. Supplementary Information: The online version contains supplementary material available at 10.1007/s11032-024-01449-w.

4.
Field Crops Res ; 308: 109281, 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38495466

RESUMEN

Breeding for disease resistance is a central component of strategies implemented to mitigate biotic stress impacts on crop yield. Conventionally, genotypes of a plant population are evaluated through a labor-intensive process of assigning visual scores (VS) of susceptibility (or resistance) by specifically trained staff, which limits manageable volumes and repeatability of evaluation trials. Remote sensing (RS) tools have the potential to streamline phenotyping processes and to deliver more standardized results at higher through-put. Here, we use a two-year evaluation trial of three newly developed biparental populations of maize doubled haploid lines (DH) to compare the results of genomic analyses of resistance to common rust (CR) when phenotyping is either based on conventional VS or on RS-derived (vegetation) indices. As a general observation, for each population × year combination, the broad sense heritability of VS was greater than or very close to the maximum heritability across all RS indices. Moreover, results of linkage mapping as well as of genomic prediction (GP), suggest that VS data was of a higher quality, indicated by higher -logp values in the linkage studies and higher predictive abilities for genomic prediction. Nevertheless, despite the qualitative differences between the phenotyping methods, each successfully identified the same genomic region on chromosome 10 as being associated with disease resistance. This region is likely related to the known CR resistance locus Rp1. Our results indicate that RS technology can be used to streamline genetic evaluation processes for foliar disease resistance in maize. In particular, RS can potentially reduce costs of phenotypic evaluations and increase trialing capacities.

5.
BMC Genomics ; 24(1): 220, 2023 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-37101112

RESUMEN

BACKGROUND: Genomic selection (GS) is revolutionizing plant and animal breeding. However, still its practical implementation is challenging since it is affected by many factors that when they are not under control make this methodology not effective. Also, due to the fact that it is formulated as a regression problem in general has low sensitivity to select the best candidate individuals since a top percentage is selected according to a ranking of predicted breeding values. RESULTS: For this reason, in this paper we propose two methods to improve the prediction accuracy of this methodology. One of the methods consist in reformulating the GS (nowadays formulated as a regression problem) methodology as a binary classification problem. The other consists only in a postprocessing step that adjust the threshold used for classification of the lines predicted in its original scale (continues scale) to guarantee similar sensitivity and specificity. The postprocessing method is applied for the resulting predictions after obtaining the predictions using the conventional regression model. Both methods assume that we defined with anticipation a threshold, to divide the training data as top lines and not top lines, and this threshold can be decided in terms of a quantile (for example 80%, 90%, etc.) or as the average (or maximum) of the performance of the checks. In the reformulation method it is required to label as one those lines in the training set that are equal or larger than the specified threshold and as zero otherwise. Then we train a binary classification model with the conventional inputs, but using the binary response variable in place of the continuous response variable. The training of the binary classification should be done to guarantee a more similar sensitivity and specificity, to guarantee a reasonable probability of classification of the top lines. CONCLUSIONS: We evaluated the proposed models in seven data sets and we found that the two proposed methods outperformed by large margin the conventional regression model (by 402.9% in terms of sensitivity, by 110.04% in terms of F1 score and by 70.96% in terms of Kappa coefficient, with the postprocessing methods). However, between the two proposed methods the postprocessing method was better than the reformulation as binary classification model. The simple postprocessing method to improve the accuracy of the conventional genomic regression models avoid the need to reformulate the conventional regression models as binary classification models with similar or better performance, that significantly improve the selection of the top best candidate lines. In general both proposed methods are simple and can easily be adopted for use in practical breeding programs, with the guarantee that will improve significantly the selection of the top best candidates lines.


Asunto(s)
Fitomejoramiento , Selección Genética , Animales , Genoma , Genómica/métodos , Fenotipo , Modelos Genéticos
6.
BMC Plant Biol ; 23(1): 10, 2023 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-36604618

RESUMEN

BACKGROUND: Success in any genomic prediction platform is directly dependent on establishing a representative training set. This is a complex task, even in single-trait single-environment conditions and tends to be even more intricated wherein additional information from envirotyping and correlated traits are considered. Here, we aimed to design optimized training sets focused on genomic prediction, considering multi-trait multi-environment trials, and how those methods may increase accuracy reducing phenotyping costs. For that, we considered single-trait multi-environment trials and multi-trait multi-environment trials for three traits: grain yield, plant height, and ear height, two datasets, and two cross-validation schemes. Next, two strategies for designing optimized training sets were conceived, first considering only the genomic by environment by trait interaction (GET), while a second including large-scale environmental data (W, enviromics) as genomic by enviromic by trait interaction (GWT). The effective number of individuals (genotypes × environments × traits) was assumed as those that represent at least 98% of each kernel (GET or GWT) variation, in which those individuals were then selected by a genetic algorithm based on prediction error variance criteria to compose an optimized training set for genomic prediction purposes. RESULTS: The combined use of genomic and enviromic data efficiently designs optimized training sets for genomic prediction, improving the response to selection per dollar invested by up to 145% when compared to the model without enviromic data, and even more when compared to cross validation scheme with 70% of training set or pure phenotypic selection. Prediction models that include G × E or enviromic data + G × E yielded better prediction ability. CONCLUSIONS: Our findings indicate that a genomic by enviromic by trait interaction kernel associated with genetic algorithms is efficient and can be proposed as a promising approach to designing optimized training sets for genomic prediction when the variance-covariance matrix of traits is available. Additionally, great improvements in the genetic gains per dollar invested were observed, suggesting that a good allocation of resources can be deployed by using the proposed approach.


Asunto(s)
Interacción Gen-Ambiente , Zea mays , Zea mays/genética , Genoma de Planta/genética , Modelos Genéticos , Selección Genética , Fenotipo , Genotipo , Genómica/métodos , Asignación de Recursos
7.
Int J Mol Sci ; 24(18)2023 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-37762107

RESUMEN

Genomic selection (GS) plays a pivotal role in hybrid prediction. It can enhance the selection of parental lines, accurately predict hybrid performance, and harness hybrid vigor. Likewise, it can optimize breeding strategies by reducing field trial requirements, expediting hybrid development, facilitating targeted trait improvement, and enhancing adaptability to diverse environments. Leveraging genomic information empowers breeders to make informed decisions and significantly improve the efficiency and success rate of hybrid breeding programs. In order to improve the genomic ability performance, we explored the incorporation of parental phenotypic information as covariates under a multi-trait framework. Approach 1, referred to as Pmean, directly utilized parental phenotypic information without any preprocessing. While approach 2, denoted as BV, replaced the direct use of phenotypic values of both parents with their respective breeding values. While an improvement in prediction performance was observed in both approaches, with a minimum 4.24% reduction in the normalized root mean square error (NRMSE), the direct incorporation of parental phenotypic information in the Pmean approach slightly outperformed the BV approach. We also compared these two approaches using linear and nonlinear kernels, but no relevant gain was observed. Finally, our results increase empirical evidence confirming that the integration of parental phenotypic information helps increase the prediction performance of hybrids.


Asunto(s)
Hibridación Genética , Modelos Genéticos , Genoma de Planta , Fenotipo , Genómica/métodos , Fitomejoramiento
8.
Theor Appl Genet ; 134(3): 941-958, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33388884

RESUMEN

KEY MESSAGE: Genome-wide association revealed that resistance to Striga hermonthica is influenced by multiple genomic regions with moderate effects. It is possible to increase genetic gains from selection for Striga resistance using genomic prediction. Striga hermonthica (Del.) Benth., commonly known as the purple witchweed or giant witchweed, is a serious problem for maize-dependent smallholder farmers in sub-Saharan Africa. Breeding for Striga resistance in maize is complicated due to limited genetic variation, complexity of resistance and challenges with phenotyping. This study was conducted to (i) evaluate a set of diverse tropical maize lines for their responses to Striga under artificial infestation in three environments in Kenya; (ii) detect quantitative trait loci associated with Striga resistance through genome-wide association study (GWAS); and (iii) evaluate the effectiveness of genomic prediction (GP) of Striga-related traits. An association mapping panel of 380 inbred lines was evaluated in three environments under artificial Striga infestation in replicated trials and genotyped with 278,810 single-nucleotide polymorphism (SNP) markers. Genotypic and genotype x environment variations were significant for measured traits associated with Striga resistance. Heritability estimates were moderate (0.42) to high (0.92) for measured traits. GWAS revealed 57 SNPs significantly associated with Striga resistance indicator traits and grain yield (GY) under artificial Striga infestation with low to moderate effect. A set of 32 candidate genes physically near the significant SNPs with roles in plant defense against biotic stresses were identified. GP with different cross-validations revealed that prediction of performance of lines in new environments is better than prediction of performance of new lines for all traits. Predictions across environments revealed high accuracy for all the traits, while inclusion of GWAS-detected SNPs led to slight increase in the accuracy. The item-based collaborative filtering approach that incorporates related traits evaluated in different environments to predict GY and Striga-related traits outperformed GP for Striga resistance indicator traits. The results demonstrated the polygenic nature of resistance to S. hermonthica, and that implementation of GP in Striga resistance breeding could potentially aid in increasing genetic gain for this important trait.


Asunto(s)
Resistencia a la Enfermedad/genética , Fitomejoramiento , Enfermedades de las Plantas/genética , Malezas/fisiología , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Striga/fisiología , Zea mays/genética , Alelos , Mapeo Cromosómico/métodos , Cromosomas de las Plantas/genética , Resistencia a la Enfermedad/inmunología , Ligamiento Genético , Marcadores Genéticos , Estudio de Asociación del Genoma Completo , Enfermedades de las Plantas/parasitología , Zea mays/inmunología , Zea mays/parasitología
9.
Theor Appl Genet ; 132(5): 1587-1606, 2019 May.
Artículo en Inglés | MEDLINE | ID: mdl-30747261

RESUMEN

KEY MESSAGE: Current genome-enabled prediction models assumed errors normally distributed, which are sensitive to outliers. We propose a model with errors assumed to follow a Laplace distribution to deal better with outliers. Current genome-enabled prediction models use regressions that fit the expected value (mean) of a response variable with errors assumed normally distributed, which are often sensitive to outliers, either genetic or environmental. For this reason, we propose a robust Bayesian genome median regression (BGMR) model that fits regressions to the medians of a distribution, with errors assumed to follow a Laplace distribution to deal better with outliers. The BGMR model was evaluated under a Bayesian framework with Markov Chain Monte Carlo sampling using a location-scale mixture representation of the Laplace distribution. The BGMR was implemented with two simulated and two real genomic data sets, and we compared its prediction performance with that of a conventional genomic best linear unbiased prediction (GBLUP) model and the Laplace maximum a posteriori (LMAP) method. The prediction accuracies of BGMR were higher than those of the GBLUP and LMAP methods when there were outliers. The BGMR model could be useful to breeders who need to predict and select genotypes based on data with unknown outliers.


Asunto(s)
Cruzamiento , Genoma de Planta , Modelos Teóricos , Plantas/genética , Teorema de Bayes , Simulación por Computador , Cadenas de Markov , Método de Montecarlo , Análisis de Regresión
10.
Theor Appl Genet ; 132(1): 177-194, 2019 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30341493

RESUMEN

Genomic selection and high-throughput phenotyping (HTP) are promising tools to accelerate breeding gains for high-yielding and climate-resilient wheat varieties. Hence, our objective was to evaluate them for predicting grain yield (GY) in drought-stressed (DS) and late-sown heat-stressed (HS) environments of the International maize and wheat improvement center's elite yield trial nurseries. We observed that the average genomic prediction accuracies using fivefold cross-validations were 0.50 and 0.51 in the DS and HS environments, respectively. However, when a different nursery/year was used to predict another nursery/year, the average genomic prediction accuracies in the DS and HS environments decreased to 0.18 and 0.23, respectively. While genomic predictions clearly outperformed pedigree-based predictions across nurseries, they were similar to pedigree-based predictions within nurseries due to small family sizes. In populations with some full-sibs in the training population, the genomic and pedigree-based prediction accuracies were on average 0.27 and 0.35 higher than the accuracies in populations with only one progeny per cross, indicating the importance of genetic relatedness between the training and validation populations for good predictions. We also evaluated the item-based collaborative filtering approach for multivariate prediction of GY using the green normalized difference vegetation index from HTP. This approach proved to be the best strategy for across-nursery predictions, with average accuracies of 0.56 and 0.62 in the DS and HS environments, respectively. We conclude that GY is a challenging trait for across-year predictions, but GS and HTP can be integrated in increasing the size of populations screened and evaluating unphenotyped large nurseries for stress-resilience within years.


Asunto(s)
Clima , Modelos Genéticos , Fitomejoramiento/métodos , Triticum/genética , Grano Comestible/genética , Genoma de Planta , Genómica , Genotipo , Ensayos Analíticos de Alto Rendimiento , Modelos Lineales , Linaje , Fenotipo , Carácter Cuantitativo Heredable
11.
Heredity (Edinb) ; 122(4): 381-401, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30120367

RESUMEN

Today, breeders perform genomic-assisted breeding to improve more than one trait. However, frequently there are several traits under study at one time, and the implementation of current genomic multiple-trait and multiple-environment models is challenging. Consequently, we propose a four-stage analysis for multiple-trait data in this paper. In the first stage, we perform singular value decomposition (SVD) on the resulting matrix of trait responses; in the second stage, we perform multiple trait analysis on transformed responses. In stages three and four, we collect and transform the traits back to their original state and obtain the parameter estimates and the predictions on these scale variables prior to transformation. The results of the proposed method are compared, in terms of parameter estimation and prediction accuracy, with the results of the Bayesian multiple-trait and multiple-environment model (BMTME) previously described in the literature. We found that the proposed method based on SVD produced similar results, in terms of parameter estimation and prediction accuracy, to those obtained with the BMTME model. Moreover, the proposed multiple-trait method is atractive because it can be implemented using current single-trait genomic prediction software, which yields a more efficient algorithm in terms of computation.


Asunto(s)
Interacción Gen-Ambiente , Genómica/métodos , Modelos Genéticos , Carácter Cuantitativo Heredable , Algoritmos , Teorema de Bayes , Cruzamiento , Genoma/genética , Genotipo , Fenotipo , Selección Genética
12.
Genes (Basel) ; 15(3)2024 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-38540344

RESUMEN

Genomic selection (GS) is revolutionizing plant breeding. However, its practical implementation is still challenging, since there are many factors that affect its accuracy. For this reason, this research explores data augmentation with the goal of improving its accuracy. Deep neural networks with data augmentation (DA) generate synthetic data from the original training set to increase the training set and to improve the prediction performance of any statistical or machine learning algorithm. There is much empirical evidence of their success in many computer vision applications. Due to this, DA was explored in the context of GS using 14 real datasets. We found empirical evidence that DA is a powerful tool to improve the prediction accuracy, since we improved the prediction accuracy of the top lines in the 14 datasets under study. On average, across datasets and traits, the gain in prediction performance of the DA approach regarding the Conventional method in the top 20% of lines in the testing set was 108.4% in terms of the NRMSE and 107.4% in terms of the MAAPE, but a worse performance was observed on the whole testing set. We encourage more empirical evaluations to support our findings.


Asunto(s)
Genoma de Planta , Genómica , Fenotipo , Aprendizaje Automático , Redes Neurales de la Computación
13.
Data Brief ; 54: 110300, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38586147

RESUMEN

Three F2-derived biparental doubled haploid (DH) maize populations were generated for genetic mapping of resistance to common rust. Each of the three populations has the same susceptible parent, but a different resistance donor parent. Population 1 and 3 consist of 320 lines each, population 2 consists of 260 lines. The DH lines were evaluated for their susceptibility to common rust in two years and with two replications in each year. For phenotyping, a visual score (VS) for susceptibility was assigned. Additionally, unmanned aerial vehicle (UAV) derived multispectral and thermal infrared data was recorded and combined in different vegetation indices ("remote sensing", RS). The DH lines were genotyped with the DarTseq method, to obtain data on single nucleotide polymorphisms (SNPs). After quality control, 9051 markers remained. Missing values were "imputed" by the empirical mean of the marker scores of the respective locus. We used the data for comparison of genome-wide association studies and genomic prediction when based on different phenotyping methods, that is either VS or RS data. The data may be interesting for reuse for instance for benchmarking genomic prediction models, for phytopathological studies addressing common rust, or for specifications of vegetation indices.

14.
Mol Plant ; 17(4): 552-578, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38475993

RESUMEN

Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.


Asunto(s)
Genoma de Planta , Fitomejoramiento , Humanos , Genoma de Planta/genética , Selección Genética , Genómica , Fenotipo , Genotipo , Plantas , Polimorfismo de Nucleótido Simple/genética
15.
Front Plant Sci ; 15: 1324090, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38504889

RESUMEN

In the field of plant breeding, various machine learning models have been developed and studied to evaluate the genomic prediction (GP) accuracy of unseen phenotypes. Deep learning has shown promise. However, most studies on deep learning in plant breeding have been limited to small datasets, and only a few have explored its application in moderate-sized datasets. In this study, we aimed to address this limitation by utilizing a moderately large dataset. We examined the performance of a deep learning (DL) model and compared it with the widely used and powerful best linear unbiased prediction (GBLUP) model. The goal was to assess the GP accuracy in the context of a five-fold cross-validation strategy and when predicting complete environments using the DL model. The results revealed the DL model outperformed the GBLUP model in terms of GP accuracy for two out of the five included traits in the five-fold cross-validation strategy, with similar results in the other traits. This indicates the superiority of the DL model in predicting these specific traits. Furthermore, when predicting complete environments using the leave-one-environment-out (LOEO) approach, the DL model demonstrated competitive performance. It is worth noting that the DL model employed in this study extends a previously proposed multi-modal DL model, which had been primarily applied to image data but with small datasets. By utilizing a moderately large dataset, we were able to evaluate the performance and potential of the DL model in a context with more information and challenging scenario in plant breeding.

16.
Front Plant Sci ; 15: 1349569, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38812738

RESUMEN

Introduction: Because Genomic selection (GS) is a predictive methodology, it needs to guarantee high-prediction accuracies for practical implementations. However, since many factors affect the prediction performance of this methodology, its practical implementation still needs to be improved in many breeding programs. For this reason, many strategies have been explored to improve the prediction performance of this methodology. Methods: When environmental covariates are incorporated as inputs in the genomic prediction models, this information only sometimes helps increase prediction performance. For this reason, this investigation explores the use of feature engineering on the environmental covariates to enhance the prediction performance of genomic prediction models. Results and discussion: We found that across data sets, feature engineering helps reduce prediction error regarding only the inclusion of the environmental covariates without feature engineering by 761.625% across predictors. These results are very promising regarding the potential of feature engineering to enhance prediction accuracy. However, since a significant gain in prediction accuracy was observed in only some data sets, further research is required to guarantee a robust feature engineering strategy to incorporate the environmental covariates.

17.
G3 (Bethesda) ; 14(2)2024 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-38079160

RESUMEN

Genomic selection is revolutionizing plant breeding. However, its practical implementation is still very challenging, since predicted values do not necessarily have high correspondence to the observed phenotypic values. When the goal is to predict within-family, it is not always possible to obtain reasonable accuracies, which is of paramount importance to improve the selection process. For this reason, in this research, we propose the Adversaria-Boruta (AB) method, which combines the virtues of the adversarial validation (AV) method and the Boruta feature selection method. The AB method operates primarily by minimizing the disparity between training and testing distributions. This is accomplished by reducing the weight assigned to markers that display the most significant differences between the training and testing sets. Therefore, the AB method built a weighted genomic relationship matrix that is implemented with the genomic best linear unbiased predictor (GBLUP) model. The proposed AB method is compared using 12 real data sets with the GBLUP model that uses a nonweighted genomic relationship matrix. Our results show that the proposed AB method outperforms the GBLUP by 8.6, 19.7, and 9.8% in terms of Pearson's correlation, mean square error, and normalized root mean square error, respectively. Our results support that the proposed AB method is a useful tool to improve the prediction accuracy of a complete family, however, we encourage other investigators to evaluate the AB method to increase the empirical evidence of its potential.


Asunto(s)
Modelos Genéticos , Polimorfismo de Nucleótido Simple , Genoma , Genómica/métodos , Modelos Lineales , Fenotipo , Genotipo
18.
Front Genet ; 14: 1124218, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37065497

RESUMEN

With the human population continuing to increase worldwide, there is pressure to employ novel technologies to increase genetic gain in plant breeding programs that contribute to nutrition and food security. Genomic selection (GS) has the potential to increase genetic gain because it can accelerate the breeding cycle, increase the accuracy of estimated breeding values, and improve selection accuracy. However, with recent advances in high throughput phenotyping in plant breeding programs, the opportunity to integrate genomic and phenotypic data to increase prediction accuracy is present. In this paper, we applied GS to winter wheat data integrating two types of inputs: genomic and phenotypic. We observed the best accuracy of grain yield when combining both genomic and phenotypic inputs, while only using genomic information fared poorly. In general, the predictions with only phenotypic information were very competitive to using both sources of information, and in many cases using only phenotypic information provided the best accuracy. Our results are encouraging because it is clear we can enhance the prediction accuracy of GS by integrating high quality phenotypic inputs in the models.

19.
Plant Genome ; 16(2): e20305, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36815225

RESUMEN

Sparse testing is essential to increase the efficiency of the genomic selection methodology, as the same efficiency (in this case prediction power) can be obtained while using less genotypes evaluated in the fields. For this reason, it is important to evaluate the existing methods for performing the allocation of lines to environments. With this goal, four methods (M1-M4) to allocate lines to environments were evaluated under the context of a multi-trait genomic prediction problem: M1 denotes the allocation of a fraction (subset) of lines in all locations, M2 denotes the allocation of a fraction of lines with some shared lines in locations but not arranged based on the balanced incomplete block design (BIBD) principle, M3 denotes the random allocation of a subset of lines to locations, and M4 denotes the allocation of a subset of lines to locations using the BIBD principle. The evaluation was done using seven real multi-environment data sets common in plant breeding programs. We found that the best method was M4 and the worst was M1, while no important differences were found between M3 and M4. We concluded that M4 and M3 are efficient in the context of sparse testing for multi-trait prediction.


Asunto(s)
Genoma de Planta , Fitomejoramiento , Fenotipo , Genotipo , Genómica
20.
G3 (Bethesda) ; 13(5)2023 05 02.
Artículo en Inglés | MEDLINE | ID: mdl-36869747

RESUMEN

While several statistical machine learning methods have been developed and studied for assessing the genomic prediction (GP) accuracy of unobserved phenotypes in plant breeding research, few methods have linked genomics and phenomics (imaging). Deep learning (DL) neural networks have been developed to increase the GP accuracy of unobserved phenotypes while simultaneously accounting for the complexity of genotype-environment interaction (GE); however, unlike conventional GP models, DL has not been investigated for when genomics is linked with phenomics. In this study we used 2 wheat data sets (DS1 and DS2) to compare a novel DL method with conventional GP models. Models fitted for DS1 were GBLUP, gradient boosting machine (GBM), support vector regression (SVR) and the DL method. Results indicated that for 1 year, DL provided better GP accuracy than results obtained by the other models. However, GP accuracy obtained for other years indicated that the GBLUP model was slightly superior to the DL. DS2 is comprised only of genomic data from wheat lines tested for 3 years, 2 environments (drought and irrigated) and 2-4 traits. DS2 results showed that when predicting the irrigated environment with the drought environment, DL had higher accuracy than the GBLUP model in all analyzed traits and years. When predicting drought environment with information on the irrigated environment, the DL model and GBLUP model had similar accuracy. The DL method used in this study is novel and presents a strong degree of generalization as several modules can potentially be incorporated and concatenated to produce an output for a multi-input data structure.


Asunto(s)
Aprendizaje Profundo , Triticum , Triticum/genética , Fitomejoramiento/métodos , Modelos Genéticos , Fenotipo , Genómica/métodos , Genotipo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA