Búsqueda | Portal de Búsqueda de la BVS

1.

Genome-wide association analysis reveals new insights into the genetic architecture of defensive, agro-morphological and quality-related traits in cassava.

Rabbi, Ismail Yusuf; Kayondo, Siraj Ismail; Bauchet, Guillaume; Yusuf, Muyideen; Aghogho, Cynthia Idhigu; Ogunpaimo, Kayode; Uwugiaren, Ruth; Smith, Ikpan Andrew; Peteti, Prasad; Agbona, Afolabi; Parkes, Elizabeth; Lydia, Ezenwaka; Wolfe, Marnin; Jannink, Jean-Luc; Egesi, Chiedozie; Kulakow, Peter.

Plant Mol Biol ; 109(3): 195-213, 2022 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-32734418

RESUMEN

KEY MESSAGE: More than 40 QTLs associated with 14 stress-related, quality and agro-morphological traits were identified. A catalogue of favourable SNP markers for MAS and a list of candidate genes are provided. Cassava (Manihot esculenta) is one of the most important starchy root crops in the tropics due to its adaptation to marginal environments. Genetic progress in this clonally propagated crop can be accelerated through the discovery of markers and candidate genes that could be used in cassava breeding programs. We carried out a genome-wide association study (GWAS) using a panel of 5130 clones developed at the International Institute of Tropical Agriculture-Nigeria. The population was genotyped at more than 100,000 SNP markers via genotyping-by-sequencing (GBS). Genomic regions underlying genetic variation for 14 traits classified broadly into four categories: biotic stress (cassava mosaic disease and cassava green mite severity); quality (dry matter content and carotenoid content) and plant agronomy (harvest index and plant type) were investigated. We also included several agro-morphological traits related to leaves, stems and roots with high heritability. In total, 41 significant associations were uncovered. While some of the identified loci matched with those previously reported, we present additional association signals for the traits. We provide a catalogue of favourable alleles at the most significant SNP for each trait-locus combination and candidate genes occurring within the GWAS hits. These resources provide a foundation for the development of markers that could be used in cassava breeding programs and candidate genes for functional validation.

Asunto(s)

Manihot , Estudio de Asociación del Genoma Completo , Manihot/genética , Fenotipo , Fitomejoramiento , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo/genética

2.

Genome-wide association mapping and genomic prediction of yield-related traits and starch pasting properties in cassava.

Phumichai, Chalermpol; Aiemnaka, Pornsak; Nathaisong, Piyaporn; Hunsawattanakul, Sirikan; Fungfoo, Phasakorn; Rojanaridpiched, Chareinsuk; Vichukit, Vichan; Kongsil, Pasajee; Kittipadakul, Piya; Wannarat, Wannasiri; Chunwongse, Julapark; Tongyoo, Pumipat; Kijkhunasatian, Chookiat; Chotineeranat, Sunee; Piyachomkwan, Kuakoon; Wolfe, Marnin D; Jannink, Jean-Luc; Sorrells, Mark E.

Theor Appl Genet ; 135(1): 145-171, 2022 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-34661695

RESUMEN

KEY MESSAGE: GWAS identified eight yield-related, peak starch type of waxy and wild-type starch and 21 starch pasting property-related traits (QTLs). Prediction ability of eight GS models resulted in low to high predictability, depending on trait, heritability, and genetic architecture. Cassava is both a food and an industrial crop in Africa, South America, and Asia, but knowledge of the genes that control yield and starch pasting properties remains limited. We carried out a genome-wide association study to clarify the molecular mechanisms underlying these traits and to explore marker-based breeding approaches. We estimated the predictive ability of genomic selection (GS) using parametric, semi-parametric, and nonparametric GS models with a panel of 276 cassava genotypes from Thai Tapioca Development Institute, International Center for Tropical Agriculture, International Institute of Tropical Agriculture, and other breeding programs. The cassava panel was genotyped via genotyping-by-sequencing, and 89,934 single-nucleotide polymorphism (SNP) markers were identified. A total of 31 SNPs associated with yield, starch type, and starch properties traits were detected by the fixed and random model circulating probability unification (FarmCPU), Bayesian-information and linkage-disequilibrium iteratively nested keyway and compressed mixed linear model, respectively. GS models were developed, and forward predictabilities using all the prediction methods resulted in values of - 0.001-0.71 for the four yield-related traits and 0.33-0.82 for the seven starch pasting property traits. This study provides additional insight into the genetic architecture of these important traits for the development of markers that could be used in cassava breeding programs.

Asunto(s)

Cromosomas de las Plantas , Genoma de Planta , Manihot/genética , Fitomejoramiento , Mapeo Cromosómico , Grano Comestible , Marcadores Genéticos , Estudio de Asociación del Genoma Completo , Desequilibrio de Ligamiento , Manihot/crecimiento & desarrollo

3.

Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations.

Hu, Haixiao; Campbell, Malachy T; Yeats, Trevor H; Zheng, Xuying; Runcie, Daniel E; Covarrubias-Pazaran, Giovanny; Broeckling, Corey; Yao, Linxing; Caffe-Treml, Melanie; Gutiérrez, Luci A; Smith, Kevin P; Tanaka, James; Hoekenga, Owen A; Sorrells, Mark E; Gore, Michael A; Jannink, Jean-Luc.

Theor Appl Genet ; 134(12): 4043-4054, 2021 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-34643760

RESUMEN

KEY MESSAGE: Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly related populations in addition to the single-environment prediction. Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M, and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17, and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.

Asunto(s)

Avena/genética , Modelos Genéticos , Valor Nutritivo , Semillas/química , Avena/química , Marcadores Genéticos , Metaboloma , Fenotipo , Fitomejoramiento , Polimorfismo de Nucleótido Simple , Transcriptoma

4.

Heritable temporal gene expression patterns correlate with metabolomic seed content in developing hexaploid oat seed.

Hu, Haixiao; Gutierrez-Gonzalez, Juan J; Liu, Xinfang; Yeats, Trevor H; Garvin, David F; Hoekenga, Owen A; Sorrells, Mark E; Gore, Michael A; Jannink, Jean-Luc.

Plant Biotechnol J ; 18(5): 1211-1222, 2020 05.

Artículo en Inglés | MEDLINE | ID: mdl-31677224

RESUMEN

Oat ranks sixth in world cereal production and has a higher content of health-promoting compounds compared with other cereals. However, there is neither a robust oat reference genome nor transcriptome. Using deeply sequenced full-length mRNA libraries of oat cultivar Ogle-C, a de novo high-quality and comprehensive oat seed transcriptome was assembled. With this reference transcriptome and QuantSeq 3' mRNA sequencing, gene expression was quantified during seed development from 22 diverse lines across six time points. Transcript expression showed higher correlations between adjacent time points. Based on differentially expressed genes, we identified 22 major temporal co-expression (TCoE) patterns of gene expression and revealed enriched gene ontology biological processes. Within each TCoE set, highly correlated transcripts, putatively commonly affected by genetic background, were clustered and termed genetic co-expression (GCoE) sets. Seventeen of the 22 TCoE sets had GCoE sets with median heritabilities higher than 0.50, and these heritability estimates were much higher than that estimated from permutation analysis, with no divergence observed in cluster sizes between permutation and non-permutation analyses. Linear regression between 634 metabolites from mature seeds and the PC1 score of each of the GCoE sets showed significantly lower p-values than permutation analysis. Temporal expression patterns of oat avenanthramides and lipid biosynthetic genes were concordant with previous studies of avenanthramide biosynthetic enzyme activity and lipid accumulation. This study expands our understanding of physiological processes that occur during oat seed maturation and provides plant breeders the means to change oat seed composition through targeted manipulation of key pathways.

Asunto(s)

Avena , Regulación de la Expresión Génica de las Plantas , Avena/genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas/genética , Metabolómica , Semillas/genética , Transcriptoma/genética

5.

A framework for genomics-informed ecophysiological modeling in plants.

Wang, Diane R; Guadagno, Carmela R; Mao, Xiaowei; Mackay, D Scott; Pleban, Jonathan R; Baker, Robert L; Weinig, Cynthia; Jannink, Jean-Luc; Ewers, Brent E.

J Exp Bot ; 70(9): 2561-2574, 2019 04 29.

Artículo en Inglés | MEDLINE | ID: mdl-30825375

RESUMEN

Dynamic process-based plant models capture complex physiological response across time, carrying the potential to extend simulations out to novel environments and lend mechanistic insight to observed phenotypes. Despite the translational opportunities for varietal crop improvement that could be unlocked by linking natural genetic variation to first principles-based modeling, these models are challenging to apply to large populations of related individuals. Here we use a combination of model development, experimental evaluation, and genomic prediction in Brassica rapa L. to set the stage for future large-scale process-based modeling of intraspecific variation. We develop a new canopy growth submodel for B. rapa within the process-based model Terrestrial Regional Ecosystem Exchange Simulator (TREES), test input parameters for feasibility of direct estimation with observed phenotypes across cultivated morphotypes and indirect estimation using genomic prediction on a recombinant inbred line population, and explore model performance on an in silico population under non-stressed and mild water-stressed conditions. We find evidence that the updated whole-plant model has the capacity to distill genotype by environment interaction (G×E) into tractable components. The framework presented offers a means to link genetic variation with environment-modulated plant response and serves as a stepping stone towards large-scale prediction of unphenotyped, genetically related individuals under untested environmental scenarios.

Asunto(s)

Genómica/métodos , Plantas/genética , Ecosistema , Genotipo , Modelos Genéticos , Estrés Fisiológico/genética , Estrés Fisiológico/fisiología

6.

High-throughput phenotyping platforms enhance genomic selection for wheat grain yield across populations and cycles in early stage.

Sun, Jin; Poland, Jesse A; Mondal, Suchismita; Crossa, José; Juliana, Philomin; Singh, Ravi P; Rutkoski, Jessica E; Jannink, Jean-Luc; Crespo-Herrera, Leonardo; Velu, Govindan; Huerta-Espino, Julio; Sorrells, Mark E.

Theor Appl Genet ; 132(6): 1705-1720, 2019 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-30778634

RESUMEN

Genomic selection (GS) models have been validated for many quantitative traits in wheat (Triticum aestivum L.) breeding. However, those models are mostly constrained within the same growing cycle and the extension of GS to the case of across cycles has been a challenge, mainly due to the low predictive accuracy resulting from two factors: reduced genetic relationships between different families and augmented environmental variances between cycles. Using the data collected from diverse field conditions at the International Wheat and Maize Improvement Center, we evaluated GS for grain yield in three elite yield trials across three wheat growing cycles. The objective of this project was to employ the secondary traits, canopy temperature, and green normalized difference vegetation index, which are closely associated with grain yield from high-throughput phenotyping platforms, to improve prediction accuracy for grain yield. The ability to predict grain yield was evaluated reciprocally across three cycles with or without secondary traits. Our results indicate that prediction accuracy increased by an average of 146% for grain yield across cycles with secondary traits. In addition, our results suggest that secondary traits phenotyped during wheat heading and early grain filling stages were optimal for enhancing the prediction accuracy for grain yield.

Asunto(s)

Genética de Población , Genoma de Planta , Genómica/métodos , Fitomejoramiento/métodos , Selección Genética , Triticum/genética , Marcadores Genéticos , Fenotipo , Triticum/crecimiento & desarrollo

7.

A statistical framework for detecting mislabeled and contaminated samples using shallow-depth sequence data.

Chan, Ariel W; Williams, Amy L; Jannink, Jean-Luc.

BMC Bioinformatics ; 19(1): 478, 2018 Dec 12.

Artículo en Inglés | MEDLINE | ID: mdl-30541436

RESUMEN

BACKGROUND: Researchers typically sequence a given individual multiple times, either re-sequencing the same DNA sample (technical replication) or sequencing different DNA samples collected on the same individual (biological replication) or both. Before merging the data from these replicate sequence runs, it is important to verify that no errors, such as DNA contamination or mix-ups, occurred during the data collection pipeline. Methods to detect such errors exist but are often ad hoc, cannot handle missing data and several require phased data. Because they require some combination of genotype calling, imputation, and haplotype phasing, these methods are unsuitable for error detection in low- to moderate-depth sequence data where such tasks are difficult to perform accurately. Additionally, because most existing methods employ a pairwise-comparison approach for error detection rather than joint analysis of the putative replicates, results may be difficult to interpret. RESULTS: We introduce a new method for error detection suitable for shallow-, moderate-, and high-depth sequence data. Using Bayes Theorem, we calculate the posterior probability distribution over the set of relations describing the putative replicates and infer which of the samples originated from an identical genotypic source. CONCLUSIONS: Our method addresses key limitations of existing approaches and produced highly accurate results in simulation experiments. Our method is implemented as an R package called BIGRED (Bayes Inferred Genotype Replicate Error Detector), which is freely available for download: https://github.com/ac2278/BIGRED .

Asunto(s)

Bases de Datos de Ácidos Nucleicos/normas , Análisis de Secuencia de ADN/métodos , Humanos

8.

Accuracy of genomic selection to predict maize single-crosses obtained through different mating designs.

Fritsche-Neto, Roberto; Akdemir, Deniz; Jannink, Jean-Luc.

Theor Appl Genet ; 131(5): 1153-1162, 2018 05.

Artículo en Inglés | MEDLINE | ID: mdl-29445844

RESUMEN

KEY MESSAGE: Testcross is the worst mating design to use as a training set to predict maize single-crosses that would be obtained through full diallel or North Carolina design II. Even though many papers have been published about genomic prediction (GP) in maize, the best mating design to build the training population has not been defined yet. Such design must maximize the accuracy given constraints on costs and on the logistics of the crosses to be made. Hence, the aims of this work were: (1) empirically evaluate the effect of the mating designs, used as training set, on genomic selection to predict maize single-crosses obtained through full diallel and North Carolina design II, (2) and identify the possibility of reducing the number of crosses and parents to compose these training sets. Our results suggest that testcross is the worst mating design to use as a training set to predict maize single-crosses that would be obtained through full diallel or North Carolina design II. Moreover, North Carolina design II is the best training set to predict hybrids taken from full diallel. However, hybrids from full diallel and North Carolina design II can be well predicted using optimized training sets, which also allow reducing the total number of crosses to be made. Nevertheless, the number of parents and the crosses per parent in the training sets should be maximized.

Asunto(s)

Fitomejoramiento , Selección Genética , Zea mays/genética , Cruzamientos Genéticos , Genotipo , Modelos Genéticos , Fenotipo

9.

Correction to: Accuracy of genomic selection to predict maize single-crosses obtained through different mating designs.

Fritsche-Neto, Roberto; Akdemir, Deniz; Jannink, Jean-Luc.

Theor Appl Genet ; 131(7): 1603, 2018 07.

Artículo en Inglés | MEDLINE | ID: mdl-29796770

RESUMEN

Unfortunately, the first author name of the above-mentioned article was incorrectly published in the original publication. The complete correct name should read as follows.

10.

Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines.

Spindel, Jennifer; Begum, Hasina; Akdemir, Deniz; Virk, Parminder; Collard, Bertrand; Redoña, Edilberto; Atlin, Gary; Jannink, Jean-Luc; McCouch, Susan R.

PLoS Genet ; 11(2): e1004982, 2015 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-25689273

RESUMEN

Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its efficacy for breeding inbred lines of rice. We performed a genome-wide association study (GWAS) in conjunction with five-fold GS cross-validation on a population of 363 elite breeding lines from the International Rice Research Institute's (IRRI) irrigated rice breeding program and herein report the GS results. The population was genotyped with 73,147 markers using genotyping-by-sequencing. The training population, statistical method used to build the GS model, number of markers, and trait were varied to determine their effect on prediction accuracy. For all three traits, genomic prediction models outperformed prediction based on pedigree records alone. Prediction accuracies ranged from 0.31 and 0.34 for grain yield and plant height to 0.63 for flowering time. Analyses using subsets of the full marker set suggest that using one marker every 0.2 cM is sufficient for genomic selection in this collection of rice breeding materials. RR-BLUP was the best performing statistical method for grain yield where no large effect QTL were detected by GWAS, while for flowering time, where a single very large effect QTL was detected, the non-GS multiple linear regression method outperformed GS models. For plant height, in which four mid-sized QTL were identified by GWAS, random forest produced the most consistently accurate GS models. Our results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline.

Asunto(s)

Estudio de Asociación del Genoma Completo , Oryza/genética , Sitios de Carácter Cuantitativo/genética , Selección Genética , Crianza de Animales Domésticos , Animales , Cruzamiento , Bovinos , Mapeo Cromosómico , Marcadores Genéticos , Genoma de Planta , Fenotipo

11.

Locally epistatic models for genome-wide prediction and association by importance sampling.

Akdemir, Deniz; Jannink, Jean-Luc; Isidro-Sánchez, Julio.

Genet Sel Evol ; 49(1): 74, 2017 10 17.

Artículo en Inglés | MEDLINE | ID: mdl-29041917

RESUMEN

BACKGROUND: In statistical genetics, an important task involves building predictive models of the genotype-phenotype relationship to attribute a proportion of the total phenotypic variance to the variation in genotypes. Many models have been proposed to incorporate additive genetic effects into prediction or association models. Currently, there is a scarcity of models that can adequately account for gene by gene or other forms of genetic interactions, and there is an increased interest in using marker annotations in genome-wide prediction and association analyses. In this paper, we discuss a hybrid modeling method which combines parametric mixed modeling and non-parametric rule ensembles. RESULTS: This approach gives us a flexible class of models that can be used to capture additive, locally epistatic genetic effects, gene-by-background interactions and allows us to incorporate one or more annotations into the genomic selection or association models. We use benchmark datasets that cover a range of organisms and traits in addition to simulated datasets to illustrate the strengths of this approach. CONCLUSIONS: In this paper, we describe a new strategy for incorporating genetic interactions into genomic prediction and association models. This strategy results in accurate models, with sometimes significantly higher accuracies than that of a standard additive model.

Asunto(s)

Algoritmos , Epistasis Genética , Estudio de Asociación del Genoma Completo/métodos , Modelos Genéticos , Animales , Ratones , Oryza/genética , Triticum/genética , Zea mays/genética

12.

Accuracies of univariate and multivariate genomic prediction models in African cassava.

Okeke, Uche Godfrey; Akdemir, Deniz; Rabbi, Ismail; Kulakow, Peter; Jannink, Jean-Luc.

Genet Sel Evol ; 49(1): 88, 2017 Dec 04.

Artículo en Inglés | MEDLINE | ID: mdl-29202685

RESUMEN

BACKGROUND: Genomic selection (GS) promises to accelerate genetic gain in plant breeding programs especially for crop species such as cassava that have long breeding cycles. Practically, to implement GS in cassava breeding, it is necessary to evaluate different GS models and to develop suitable models for an optimized breeding pipeline. In this paper, we compared (1) prediction accuracies from a single-trait (uT) and a multi-trait (MT) mixed model for a single-environment genetic evaluation (Scenario 1), and (2) accuracies from a compound symmetric multi-environment model (uE) parameterized as a univariate multi-kernel model to a multivariate (ME) multi-environment mixed model that accounts for genotype-by-environment interaction for multi-environment genetic evaluation (Scenario 2). For these analyses, we used 16 years of public cassava breeding data for six target cassava traits and a fivefold cross-validation scheme with 10-repeat cycles to assess model prediction accuracies. RESULTS: In Scenario 1, the MT models had higher prediction accuracies than the uT models for all traits and locations analyzed, which amounted to on average a 40% improved prediction accuracy. For Scenario 2, we observed that the ME model had on average (across all locations and traits) a 12% improved prediction accuracy compared to the uE model. CONCLUSIONS: We recommend the use of multivariate mixed models (MT and ME) for cassava genetic evaluation. These models may be useful for other plant species.

Asunto(s)

Genoma de Planta , Genómica , Manihot/genética , Modelos Genéticos , Cruzamiento , Genotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Selección Genética

13.

Identification and distribution of the NBS-LRR gene family in the Cassava genome.

Lozano, Roberto; Hamblin, Martha T; Prochnik, Simon; Jannink, Jean-Luc.

BMC Genomics ; 16: 360, 2015 May 07.

Artículo en Inglés | MEDLINE | ID: mdl-25948536

RESUMEN

BACKGROUND: Plant resistance genes (R genes) exist in large families and usually contain both a nucleotide-binding site domain and a leucine-rich repeat domain, denoted NBS-LRR. The genome sequence of cassava (Manihot esculenta) is a valuable resource for analysing the genomic organization of resistance genes in this crop. RESULTS: With searches for Pfam domains and manual curation of the cassava gene annotations, we identified 228 NBS-LRR type genes and 99 partial NBS genes. These represent almost 1% of the total predicted genes and show high sequence similarity to proteins from other plant species. Furthermore, 34 contained an N-terminal toll/interleukin (TIR)-like domain, and 128 contained an N-terminal coiled-coil (CC) domain. 63% of the 327 R genes occurred in 39 clusters on the chromosomes. These clusters are mostly homogeneous, containing NBS-LRRs derived from a recent common ancestor. CONCLUSIONS: This study provides insight into the evolution of NBS-LRR genes in the cassava genome; the phylogenetic and mapping information may aid efforts to further characterize the function of these predicted R genes.

Asunto(s)

Genoma de Planta , Manihot/genética , Proteínas de Plantas/genética , Mapeo Cromosómico , Regulación de la Expresión Génica de las Plantas , Interacciones Huésped-Patógeno/genética , Manihot/microbiología , Filogenia , Proteínas de Plantas/clasificación , Proteínas de Plantas/metabolismo , Estructura Terciaria de Proteína , Xanthomonas axonopodis/patogenicidad

14.

Training set optimization under population structure in genomic selection.

Isidro, Julio; Jannink, Jean-Luc; Akdemir, Deniz; Poland, Jesse; Heslot, Nicolas; Sorrells, Mark E.

Theor Appl Genet ; 128(1): 145-58, 2015 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25367380

RESUMEN

KEY MESSAGE: Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.

Asunto(s)

Genética de Población/métodos , Genómica/métodos , Selección Genética , Cruzamiento , Análisis por Conglomerados , Genotipo , Modelos Estadísticos , Oryza/genética , Fenotipo , Análisis de Componente Principal , Triticum/genética

15.

An alternative covariance estimator to investigate genetic heterogeneity in populations.

Heslot, Nicolas; Jannink, Jean-Luc.

Genet Sel Evol ; 47: 93, 2015 Nov 26.

Artículo en Inglés | MEDLINE | ID: mdl-26612537

RESUMEN

BACKGROUND: For genomic prediction and genome-wide association studies (GWAS) using mixed models, covariance between individuals is estimated using molecular markers. Based on the properties of mixed models, using available molecular data for prediction is optimal if this covariance is known. Under this assumption, adding individuals to the analysis should never be detrimental. However, some empirical studies showed that increasing training population size decreased prediction accuracy. Recently, results from theoretical models indicated that even if marker density is high and the genetic architecture of traits is controlled by many loci with small additive effects, the covariance between individuals, which depends on relationships at causal loci, is not always well estimated by the whole-genome kinship. RESULTS: We propose an alternative covariance estimator named K-kernel, to account for potential genetic heterogeneity between populations that is characterized by a lack of genetic correlation, and to limit the information flow between a priori unknown populations in a trait-specific manner. This is similar to a multi-trait model and parameters are estimated by REML and, in extreme cases, it can allow for an independent genetic architecture between populations. As such, K-kernel is useful to study the problem of the design of training populations. K-kernel was compared to other covariance estimators or kernels to examine its fit to the data, cross-validated accuracy and suitability for GWAS on several datasets. It provides a significantly better fit to the data than the genomic best linear unbiased prediction model and, in some cases it performs better than other kernels such as the Gaussian kernel, as shown by an empirical null distribution. In GWAS simulations, alternative kernels control type I errors as well as or better than the classical whole-genome kinship and increase statistical power. No or small gains were observed in cross-validated prediction accuracy. CONCLUSIONS: This alternative covariance estimator can be used to gain insight into trait-specific genetic heterogeneity by identifying relevant sub-populations that lack genetic correlation between them. Genetic correlation can be 0 between identified sub-populations by performing automatic selection of relevant sets of individuals to be included in the training population. It may also increase statistical power in GWAS.

Asunto(s)

Heterogeneidad Genética , Genética de Población , Modelos Genéticos , Algoritmos , Cruzamiento , Simulación por Computador , Conjuntos de Datos como Asunto , Estudios de Asociación Genética , Marcadores Genéticos , Genética de Población/métodos , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Modelos Estadísticos , Carácter Cuantitativo Heredable , Reproducibilidad de los Resultados

16.

Optimization of genomic selection training populations with a genetic algorithm.

Akdemir, Deniz; Sanchez, Julio I; Jannink, Jean-Luc.

Genet Sel Evol ; 47: 38, 2015 May 06.

Artículo en Inglés | MEDLINE | ID: mdl-25943105

RESUMEN

In this article, we imagine a breeding scenario with a population of individuals that have been genotyped but not phenotyped. We derived a computationally efficient statistic that uses this genetic information to measure the reliability of genomic estimated breeding values (GEBV) for a given set of individuals (test set) based on a training set of individuals. We used this reliability measure with a genetic algorithm scheme to find an optimized training set from a larger set of candidate individuals. This subset was phenotyped to create the training set that was used in a genomic selection model to estimate GEBV in the test set. Our results show that, compared to a random sample of the same size, the use of a set of individuals selected by our method improved accuracies. We implemented the proposed training selection methodology on four sets of data on Arabidopsis, wheat, rice and maize. This dynamic model building process that takes genotypes of the individuals in the test sample into account while selecting the training individuals improves the performance of genomic selection models.

Asunto(s)

Algoritmos , Genómica/métodos , Fitomejoramiento/métodos , Arabidopsis/genética , Genotipo , Modelos Genéticos , Oryza/genética , Fenotipo , Triticum/genética , Zea mays/genética

17.

solGS: a web-based tool for genomic selection.

Tecle, Isaak Y; Edwards, Jeremy D; Menda, Naama; Egesi, Chiedozie; Rabbi, Ismail Y; Kulakow, Peter; Kawuki, Robert; Jannink, Jean-Luc; Mueller, Lukas A.

BMC Bioinformatics ; 15: 398, 2014 Dec 14.

Artículo en Inglés | MEDLINE | ID: mdl-25495537

RESUMEN

BACKGROUND: Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, analysis, and sharing. A bioinformatics infrastructure for data storage and access, and user-friendly web-based tool for analysis and sharing output is needed to make GS more practical for breeders. RESULTS: We have developed a web-based tool, called solGS, for predicting genomic estimated breeding values (GEBVs) of individuals, using a Ridge-Regression Best Linear Unbiased Predictor (RR-BLUP) model. It has an intuitive web-interface for selecting a training population for modeling and estimating genomic estimated breeding values of selection candidates. It estimates phenotypic correlation and heritability of traits and selection indices of individuals. Raw data is stored in a generic database schema, Chado Natural Diversity, co-developed by multiple database groups. Analysis output is graphically visualized and can be interactively explored online or downloaded in text format. An instance of its implementation can be accessed at the NEXTGEN Cassava breeding database, http://cassavabase.org/solgs. CONCLUSIONS: solGS enables breeders to store raw data and estimate GEBVs of individuals online, in an intuitive and interactive workflow. It can be adapted to any breeding program.

Asunto(s)

Cruzamiento , Manihot/genética , Programas Informáticos , Genómica , Internet , Manihot/fisiología , Sitios de Carácter Cuantitativo

18.

Population genetics of genomics-based crop improvement methods.

Hamblin, Martha T; Buckler, Edward S; Jannink, Jean-Luc.

Trends Genet ; 27(3): 98-106, 2011 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-21227531

RESUMEN

Many genome-wide association studies (GWAS) in humans are concluding that, even with very large sample sizes and high marker densities, most of the genetic basis of complex traits may remain unexplained. At the same time, recent research in plant GWAS is showing much greater success with fewer resources. Both GWAS and genomic selection (GS), a method for predicting phenotypes by the use of genome-wide marker data, are receiving considerable attention among plant breeders. In this review we explore how differences in population genetic histories, as well as past selection for traits of interest, have produced trait architectures and patterns of linkage disequilibrium (LD) that frequently differ dramatically between domesticated plants and humans, making detection of quantitative trait loci (QTL) effects in crops more rewarding and less costly than in humans.

Asunto(s)

Agricultura/métodos , Productos Agrícolas/genética , Genética de Población , Genómica , Cruzamiento , Frecuencia de los Genes , Desequilibrio de Ligamiento , Selección Genética/genética

19.

Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions.

Heslot, Nicolas; Akdemir, Deniz; Sorrells, Mark E; Jannink, Jean-Luc.

Theor Appl Genet ; 127(2): 463-80, 2014 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-24264761

RESUMEN

KEY MESSAGE: Development of models to predict genotype by environment interactions, in unobserved environments, using environmental covariates, a crop model and genomic selection. Application to a large winter wheat dataset. Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome-wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data were available increased by 11.1% on average and the variability in prediction accuracy decreased by 10.8%. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios.

Asunto(s)

Productos Agrícolas/genética , Interacción Gen-Ambiente , Genotipo , Modelos Genéticos

20.

Genetic analysis of cassava brown streak disease root necrosis using image analysis and genome-wide association studies.

Nandudu, Leah; Strock, Christopher; Ogbonna, Alex; Kawuki, Robert; Jannink, Jean-Luc.

Front Plant Sci ; 15: 1360729, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38562560

RESUMEN

Cassava brown streak disease (CBSD) poses a substantial threat to food security. To address this challenge, we used PlantCV to extract CBSD root necrosis image traits from 320 clones, with an aim of identifying genomic regions through genome-wide association studies (GWAS) and candidate genes. Results revealed strong correlations among certain root necrosis image traits, such as necrotic area fraction and necrotic width fraction, as well as between the convex hull area of root necrosis and the percentage of necrosis. Low correlations were observed between CBSD scores obtained from the 1-5 scoring method and all root necrosis traits. Broad-sense heritability estimates of root necrosis image traits ranged from low to moderate, with the highest estimate of 0.42 observed for the percentage of necrosis, while narrow-sense heritability consistently remained low, ranging from 0.03 to 0.22. Leveraging data from 30,750 SNPs obtained through DArT genotyping, eight SNPs on chromosomes 1, 7, and 11 were identified and associated with both the ellipse eccentricity of root necrosis and the percentage of necrosis through GWAS. Candidate gene analysis in the 172.2kb region on the chromosome 1 revealed 24 potential genes with diverse functions, including ubiquitin-protein ligase, DNA-binding transcription factors, and RNA metabolism protein, among others. Despite our initial expectation that image analysis objectivity would yield better heritability estimates and stronger genomic associations than the 1-5 scoring method, the results were unexpectedly lower. Further research is needed to comprehensively understand the genetic basis of these traits and their relevance to cassava breeding and disease management.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA