Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 9 de 9
Filtrer
Plus de filtres










Base de données
Gamme d'année
1.
Plant Methods ; 20(1): 85, 2024 Jun 06.
Article de Anglais | MEDLINE | ID: mdl-38844940

RÉSUMÉ

The selection of highly productive genotypes with stable performance across environments is a major challenge of plant breeding programs due to genotype-by-environment (GE) interactions. Over the years, different metrics have been proposed that aim at characterizing the superiority and/or stability of genotype performance across environments. However, these metrics are traditionally estimated using phenotypic values only and are not well suited to an unbalanced design in which genotypes are not observed in all environments. The objective of this research was to propose and evaluate new estimators of the following GE metrics: Ecovalence, Environmental Variance, Finlay-Wilkinson regression coefficient, and Lin-Binns superiority measure. Drawing from a multi-environment genomic prediction model, we derived the best linear unbiased prediction for each GE metric. These derivations included both a squared expectation and a variance term. To assess the effectiveness of our new estimators, we conducted simulations that varied in traits and environment parameters. In our results, new estimators consistently outperformed traditional phenotype-based estimators in terms of accuracy. By incorporating a variance term into our new estimators, in addition to the squared expectation term, we were able to improve the precision of our estimates, particularly for Ecovalence in situations where heritability was low and/or sparseness was high. All methods are implemented in a new R-package: GEmetrics. These genomic-based estimators enable estimating GE metrics in unbalanced designs and predicting GE metrics for new genotypes, which should help improve the selection efficiency of high-performance and stable genotypes across environments.

3.
Plant Methods ; 20(1): 42, 2024 Mar 16.
Article de Anglais | MEDLINE | ID: mdl-38493115

RÉSUMÉ

Genomic selection (GS) has become an increasingly popular tool in plant breeding programs, propelled by declining genotyping costs, an increase in computational power, and rediscovery of the best linear unbiased prediction methodology over the past two decades. This development has led to an accumulation of extensive historical datasets with genotypic and phenotypic information, triggering the question of how to best utilize these datasets. Here, we investigate whether all available data or a subset should be used to calibrate GS models for across-year predictions in a 7-year dataset of a commercial hybrid sunflower breeding program. We employed a multi-objective optimization approach to determine the ideal years to include in the training set (TRS). Next, for a given combination of TRS years, we further optimized the TRS size and its genetic composition. We developed the Min_GRM size optimization method which consistently found the optimal TRS size, reducing dimensionality by 20% with an approximately 1% loss in predictive ability. Additionally, the Tails_GEGVs algorithm displayed potential, outperforming the use of all data by using just 60% of it for grain yield, a high-complexity, low-heritability trait. Moreover, maximizing the genetic diversity of the TRS resulted in a consistent predictive ability across the entire range of genotypic values in the test set. Interestingly, the Tails_GEGVs algorithm, due to its ability to leverage heterogeneity, enhanced predictive performance for key hybrids with extreme genotypic values. Our study provides new insights into the optimal utilization of historical data in plant breeding programs, resulting in improved GS model predictive ability.

4.
Mol Plant ; 17(4): 552-578, 2024 Apr 01.
Article de Anglais | MEDLINE | ID: mdl-38475993

RÉSUMÉ

Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining GP accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single-nucleotide polymorphisms, level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine-learning methods, and non-additive effects are the other vital factors. Using wheat, maize, and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP-theoretically reaching one when using the Pearson's correlation as a metric-is an active research area as yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep-learning algorithms could overcome the boundaries of current limitations to achieve the highest possible prediction accuracy, making genomic selection an effective tool in plant breeding.


Sujet(s)
Génome végétal , Amélioration des plantes , Humains , Génome végétal/génétique , Sélection génétique , Génomique , Phénotype , Génotype , Plantes , Polymorphisme de nucléotide simple/génétique
5.
Theor Appl Genet ; 136(3): 30, 2023 Mar 09.
Article de Anglais | MEDLINE | ID: mdl-36892603

RÉSUMÉ

KEY MESSAGE: Maximizing CDmean and Avg_GRM_self were the best criteria for training set optimization. A training set size of 50-55% (targeted) or 65-85% (untargeted) is needed to obtain 95% of the accuracy.  With the advent of genomic selection (GS) as a widespread breeding tool, mechanisms to efficiently design an optimal training set for GS models became more relevant, since they allow maximizing the accuracy while minimizing the phenotyping costs. The literature described many training set optimization methods, but there is a lack of a comprehensive comparison among them. This work aimed to provide an extensive benchmark among optimization methods and optimal training set size by testing a wide range of them in seven datasets, six different species, different genetic architectures, population structure, heritabilities, and with several GS models to provide some guidelines about their application in breeding programs. Our results showed that targeted optimization (uses information from the test set) performed better than untargeted (does not use test set data), especially when heritability was low. The mean coefficient of determination was the best targeted method, although it was computationally intensive. Minimizing the average relationship within the training set was the best strategy for untargeted optimization. Regarding the optimal training set size, maximum accuracy was obtained when the training set was the entire candidate set. Nevertheless, a 50-55% of the candidate set was enough to reach 95-100% of the maximum accuracy in the targeted scenario, while we needed a 65-85% for untargeted optimization. Our results also suggested that a diverse training set makes GS robust against population structure, while including clustering information was less effective. The choice of the GS model did not have a significant influence on the prediction accuracies.


Sujet(s)
Modèles génétiques , Sélection génétique , Phénotype , Génome , Génomique/méthodes , Génotype
6.
Front Plant Sci ; 12: 715910, 2021.
Article de Anglais | MEDLINE | ID: mdl-34589099

RÉSUMÉ

Genomic selection (GS) is becoming an essential tool in breeding programs due to its role in increasing genetic gain per unit time. The design of the training set (TRS) in GS is one of the key steps in the implementation of GS in plant and animal breeding programs mainly because (i) TRS optimization is critical for the efficiency and effectiveness of GS, (ii) breeders test genotypes in multi-year and multi-location trials to select the best-performing ones. In this framework, TRS optimization can help to decrease the number of genotypes to be tested and, therefore, reduce phenotyping cost and time, and (iii) we can obtain better prediction accuracies from optimally selected TRS than an arbitrary TRS. Here, we concentrate the efforts on reviewing the lessons learned from TRS optimization studies and their impact on crop breeding and discuss important features for the success of TRS optimization under different scenarios. In this article, we review the lessons learned from training population optimization in plants and the major challenges associated with the optimization of GS including population size, the relationship between training and test set (TS), update of TRS, and the use of different packages and algorithms for TRS implementation in GS. Finally, we describe general guidelines to improving the rate of genetic improvement by maximizing the use of the TRS optimization in the GS framework.

7.
Theor Appl Genet ; 134(11): 3595-3609, 2021 Nov.
Article de Anglais | MEDLINE | ID: mdl-34341832

RÉSUMÉ

KEY MESSAGE: The strong genetic structure observed in Mediterranean oats affects the predictive ability of genomic prediction as well as the performance of training set optimization methods. In this study, we investigated the efficiency of genomic prediction and training set optimization in a highly structured population of cultivars and landraces of cultivated oat (Avena sativa) from the Mediterranean basin, including white (subsp. sativa) and red (subsp. byzantina) oats, genotyped using genotype-by-sequencing markers and evaluated for agronomic traits in Southern Spain. For most traits, the predictive abilities were moderate to high with little differences between models, except for biomass for which Bayes-B showed a substantial gain compared to other models. The consistency between the structure of the training population and the population to be predicted was key to the predictive ability of genomic predictions. The predictive ability of inter-subspecies predictions was indeed much lower than that of intra-subspecies predictions for all traits. Regarding training set optimization, the linear mixed model optimization criteria (prediction error variance (PEVmean) and coefficient of determination (CDmean)) performed better than the heuristic approach "partitioning around medoids," even under high population structure. The superiority of CDmean and PEVmean could be explained by their ability to adapt the representation of each genetic group according to those represented in the population to be predicted. These results represent an important step towards the implementation of genomic prediction in oat breeding programs and address important issues faced by the genomic prediction community regarding population structure and training set optimization.


Sujet(s)
Avena/génétique , Génétique des populations , Génome végétal , Modèles génétiques , Théorème de Bayes , Grains comestibles/génétique , Génomique/méthodes , Génotype , Région méditerranéenne , Phénotype , Amélioration des plantes , Espagne
8.
Front Genet ; 12: 655287, 2021.
Article de Anglais | MEDLINE | ID: mdl-34025720

RÉSUMÉ

A major barrier to the wider use of supervised learning in emerging applications, such as genomic selection, is the lack of sufficient and representative labeled data to train prediction models. The amount and quality of labeled training data in many applications is usually limited and therefore careful selection of the training examples to be labeled can be useful for improving the accuracies in predictive learning tasks. In this paper, we present an R package, TrainSel, which provides flexible, efficient, and easy-to-use tools that can be used for the selection of training populations (STP). We illustrate its use, performance, and potentials in four different supervised learning applications within and outside of the plant breeding area.

9.
Front Plant Sci ; 11: 947, 2020.
Article de Anglais | MEDLINE | ID: mdl-32765543

RÉSUMÉ

Private and public breeding programs, as well as companies and universities, have developed different genomics technologies that have resulted in the generation of unprecedented amounts of sequence data, which bring new challenges in terms of data management, query, and analysis. The magnitude and complexity of these datasets bring new challenges but also an opportunity to use the data available as a whole. Detailed phenotype data, combined with increasing amounts of genomic data, have an enormous potential to accelerate the identification of key traits to improve our understanding of quantitative genetics. Data harmonization enables cross-national and international comparative research, facilitating the extraction of new scientific knowledge. In this paper, we address the complex issue of combining high dimensional and unbalanced omics data. More specifically, we propose a covariance-based method for combining partial datasets in the genotype to phenotype spectrum. This method can be used to combine partially overlapping relationship/covariance matrices. Here, we show with applications that our approach might be advantageous to feature imputation based approaches; we demonstrate how this method can be used in genomic prediction using heterogeneous marker data and also how to combine the data from multiple phenotypic experiments to make inferences about previously unobserved trait relationships. Our results demonstrate that it is possible to harmonize datasets to improve available information across gene-banks, data repositories, or other data resources.

SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...