RESUMEN
Advances in sequencing technology allow whole plant genomes to be sequenced with high quality. Combining genotypic and phenotypic data in genomic prediction helps breeders to select crossing partners in partially phenotyped populations. In plant breeding programs, the cost of sequencing entire breeding populations still exceeds available genotyping budgets. Hence, the method for genotyping is still mainly single nucleotide polymorphism (SNP) arrays; however, arrays are unable to assess the entire genome- and population-wide diversity. A compromise involves genotyping the entire population using an SNP array and a subset of the population with whole-genome sequencing. Both datasets can then be used to impute markers from whole-genome sequencing onto the entire population. Here, we evaluate whether imputation of whole-genome sequencing data enhances genomic predictions, using data from a nested association mapping population of rapeseed (Brassica napus). Employing two cross-validation schemes that mimic scenarios for the prediction of close and distant relatives, we show that imputed marker data do not significantly improve prediction accuracy, likely due to redundancy in relationship estimates and imputation errors. In simulation studies, only small improvements were observed, further corroborating the findings. We conclude that SNP arrays are already equipped with the information that is added by imputation through relationship and linkage disequilibrium.
Asunto(s)
Brassica napus , Genoma de Planta , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma , Brassica napus/genética , Secuenciación Completa del Genoma/métodos , Fitomejoramiento/métodos , Desequilibrio de Ligamiento , Genómica/métodos , GenotipoRESUMEN
Recombination is a key mechanism in breeding for promoting genetic variability. Multiparental populations (MPPs) constitute an excellent platform for precise genotype phasing, identification of genome-wide crossovers (COs), estimation of recombination frequencies, and construction of recombination maps. Here, we introduce haploMAGIC, a pipeline to detect COs in MPPs with single-nucleotide polymorphism (SNP) data by exploiting the pedigree relationships for accurate genotype phasing and inference of grandparental haplotypes. haploMAGIC applies filtering to prevent false-positive COs due to genotyping errors (GEs), a common problem in high-throughput SNP analysis of complex plant genomes. Hence, it discards haploblocks not reaching a specified minimum number of informative alleles. A performance analysis using populations simulated with AlphaSimR revealed that haploMAGIC improves upon existing methods of CO detection in terms of recall and precision, most notably when GE rates are high. Furthermore, we constructed recombination maps using haploMAGIC with high-resolution genotype data from 2 large multiparental populations of winter rapeseed (Brassica napus). The results demonstrate the applicability of the pipeline in real-world scenarios and showed good correlations in recombination frequency compared with alternative software. Therefore, we propose haploMAGIC as an accurate tool at CO detection with MPPs that shows robustness against GEs.
Asunto(s)
Técnicas de Genotipaje , Haplotipos , Polimorfismo de Nucleótido Simple , Recombinación Genética , Técnicas de Genotipaje/métodos , Brassica napus/genética , Programas Informáticos , Genotipo , Genoma de Planta , Intercambio GenéticoRESUMEN
KEY MESSAGE: Simulation planned pre-breeding can increase the efficiency of starting a hybrid breeding program. Starting a hybrid breeding program commonly comprises a grouping of the initial germplasm in two pools and subsequent selection on general combining ability. Investigations on pre-breeding steps before starting the selection on general combining ability are not available. Our goals were (1) to use computer simulations on the basis of DNA markers and testcross data to plan crosses that separate genetically two initial germplasm pools of rapeseed, (2) to carry out the planned crosses, and (3) to verify experimentally the pool separation as well as the increase in testcross performance. We designed a crossing program consisting of four cycles of recombination. In each cycle, the experimentally generated material was used to plan the subsequent crossing cycle with computer simulations. After finishing the crossing program, the initially overlapping pools were clearly separated in principal coordinate plots. Doubled haploid lines derived from the material of crossing cycles 1 and 2 showed an increase in relative testcross performance for yield of about 5% per cycle. We conclude that simulation-designed pre-breeding crossing schemes, that were carried out before the general combining ability-based selection of a newly started hybrid breeding program, can save time and resources, and in addition conserve more of the initial genetic variation than a direct start of a hybrid breeding program with general combining ability-based selection.
Asunto(s)
Brassica napus , Brassica rapa , Brassica napus/genética , Fitomejoramiento , Brassica rapa/genética , Simulación por Computador , HaploidiaRESUMEN
Testcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms might improve prediction of hybrid performance in such testcross factorials, as they have been successfully applied to find complex underlying patterns in sparse data. Our objective was to compare the prediction accuracy of machine learning algorithms to that of GCA-based prediction and genomic best linear unbiased prediction (GBLUP) in six unbalanced incomplete factorials from hybrid breeding programs of rapeseed, wheat, and corn. We investigated a range of machine learning algorithms with three different types of predictor variables: (a) information on parentage of hybrids, (b) in addition hybrid performance of crosses of the parental lines with other crossing partners, and (c) genotypic marker data. In two highly incomplete and unbalanced factorials from rapeseed, in which the SCA variance contributed considerably to the genetic variance, stacked ensembles of gradient boosting machines based on parentage information outperformed GCA prediction. The stacked ensembles increased prediction accuracy from 0.39 to 0.45, and from 0.48 to 0.54 compared to GCA prediction. The prediction accuracy reached by stacked ensembles without marker data reached values comparable to those of GBLUP that requires marker data. We conclude that hybrid prediction with stacked ensembles of gradient boosting machines based on parentage information is a promising approach that is worth further investigations with other data sets in which SCA variance is high.
RESUMEN
Over the last two decades, the application of genomic selection has been extensively studied in various crop species, and it has become a common practice to report prediction accuracies using cross validation. However, genomic prediction accuracies obtained from random cross validation can be strongly inflated due to population or family structure, a characteristic shared by many breeding populations. An understanding of the effect of population and family structure on prediction accuracy is essential for the successful application of genomic selection in plant breeding programs. The objective of this study was to make this effect and its implications for practical breeding programs comprehensible for breeders and scientists with a limited background in quantitative genetics and genomic selection theory. We, therefore, compared genomic prediction accuracies obtained from different random cross validation approaches and within-family prediction in three different prediction scenarios. We used a highly structured population of 940 Brassica napus hybrids coming from 46 testcross families and two subpopulations. Our demonstrations show how genomic prediction accuracies obtained from among-family predictions in random cross validation and within-family predictions capture different measures of prediction accuracy. While among-family prediction accuracy measures prediction accuracy of both the parent average component and the Mendelian sampling term, within-family prediction only measures how accurately the Mendelian sampling term can be predicted. With this paper we aim to foster a critical approach to different measures of genomic prediction accuracy and a careful analysis of values observed in genomic selection experiments and reported in literature.
RESUMEN
The need to improve hybrid performance, abiotic stress tolerance, and disease resistance without compromising seed quality makes the targeted capture of untapped diversity a major objective for crop breeders. Here we introduce the concept of Heterotic Haplotype Capture (HHC), in which genome sequence imputation is used to trace novel heterozygous chromosome blocks contributing to hybrid performance in large, structured populations of interrelated F1 hybrids containing interesting new diversity for breeding.