Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Mol Biol Evol ; 38(10): 4419-4434, 2021 09 27.
Artículo en Inglés | MEDLINE | ID: mdl-34157722

RESUMEN

Understanding the evolutionary history of crops, including identifying wild relatives, helps to provide insight for conservation and crop breeding efforts. Cultivated Brassica oleracea has intrigued researchers for centuries due to its wide diversity in forms, which include cabbage, broccoli, cauliflower, kale, kohlrabi, and Brussels sprouts. Yet, the evolutionary history of this species remains understudied. With such different vegetables produced from a single species, B. oleracea is a model organism for understanding the power of artificial selection. Persistent challenges in the study of B. oleracea include conflicting hypotheses regarding domestication and the identity of the closest living wild relative. Using newly generated RNA-seq data for a diversity panel of 224 accessions, which represents 14 different B. oleracea crop types and nine potential wild progenitor species, we integrate phylogenetic and population genetic techniques with ecological niche modeling, archaeological, and literary evidence to examine relationships among cultivars and wild relatives to clarify the origin of this horticulturally important species. Our analyses point to the Aegean endemic B. cretica as the closest living relative of cultivated B. oleracea, supporting an origin of cultivation in the Eastern Mediterranean region. Additionally, we identify several feral lineages, suggesting that cultivated plants of this species can revert to a wild-like state with relative ease. By expanding our understanding of the evolutionary history in B. oleracea, these results contribute to a growing body of knowledge on crop domestication that will facilitate continued breeding efforts including adaptation to changing environmental conditions.


Asunto(s)
Brassica , Fitomejoramiento , Evolución Biológica , Brassica/genética , Productos Agrícolas/genética , Filogenia
2.
BMC Plant Biol ; 22(1): 87, 2022 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-35219296

RESUMEN

BACKGROUND: Genomic selection is a powerful tool in plant breeding. By building a prediction model using a training set with markers and phenotypes, genomic estimated breeding values (GEBVs) can be used as predictions of breeding values in a target set with only genotype data. There is, however, limited information on how prediction accuracy of genomic prediction can be optimized. The objective of this study was to evaluate the performance of 11 genomic prediction models across species in terms of prediction accuracy for two traits with different heritabilities using several subsets of markers and training population proportions. Species studied were maize (Zea mays, L.), soybean (Glycine max, L.), and rice (Oryza sativa, L.), which vary in linkage disequilibrium (LD) decay rates and have contrasting genetic architectures. RESULTS: Correlations between observed and predicted GEBVs were determined via cross validation for three training-to-testing proportions (90:10, 70:30, and 50:50). Maize, which has the shortest extent of LD, showed the highest prediction accuracy. Amongst all the models tested, Bayes B performed better than or equal to all other models for each trait in all the three crops. Traits with higher broad-sense and narrow-sense heritabilities were associated with higher prediction accuracy. When subsets of markers were selected based on LD, the accuracy was similar to that observed from the complete set of markers. However, prediction accuracies were significantly improved when using a subset of total markers that were significant at P ≤ 0.05 or P ≤ 0.10. As expected, exclusion of QTL-associated markers in the model reduced prediction accuracy. Prediction accuracy varied among different training population proportions. CONCLUSIONS: We conclude that prediction accuracy for genomic selection can be improved by using the Bayes B model with a subset of significant markers and by selecting the training population based on narrow sense heritability.


Asunto(s)
Glycine max/genética , Modelos Genéticos , Oryza/genética , Zea mays/genética , Marcadores Genéticos , Genoma de Planta , Desequilibrio de Ligamiento , Oryza/fisiología , Fenotipo , Fitomejoramiento , Polimorfismo de Nucleótido Simple , Glycine max/fisiología , Zea mays/fisiología
3.
Plant Cell ; 31(9): 1968-1989, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31239390

RESUMEN

Premature senescence in annual crops reduces yield, while delayed senescence, termed stay-green, imposes positive and negative impacts on yield and nutrition quality. Despite its importance, scant information is available on the genetic architecture of senescence in maize (Zea mays) and other cereals. We combined a systematic characterization of natural diversity for senescence in maize and coexpression networks derived from transcriptome analysis of normally senescing and stay-green lines. Sixty-four candidate genes were identified by genome-wide association study (GWAS), and 14 of these genes are supported by additional evidence for involvement in senescence-related processes including proteolysis, sugar transport and signaling, and sink activity. Eight of the GWAS candidates, independently supported by a coexpression network underlying stay-green, include a trehalose-6-phosphate synthase, a NAC transcription factor, and two xylan biosynthetic enzymes. Source-sink communication and the activity of cell walls as a secondary sink emerge as key determinants of stay-green. Mutant analysis supports the role of a candidate encoding Cys protease in stay-green in Arabidopsis (Arabidopsis thaliana), and analysis of natural alleles suggests a similar role in maize. This study provides a foundation for enhanced understanding and manipulation of senescence for increasing carbon yield, nutritional quality, and stress tolerance of maize and other cereals.


Asunto(s)
Envejecimiento/genética , Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes , Genes de Plantas/genética , Zea mays/genética , Arabidopsis/genética , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Glucosiltransferasas/genética , Hojas de la Planta , Polimorfismo de Nucleótido Simple , Factores de Transcripción/genética , Transcriptoma
4.
Plant Cell Physiol ; 62(7): 1199-1214, 2021 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-34015110

RESUMEN

The strength of the stalk rind, measured as rind penetrometer resistance (RPR), is an important contributor to stalk lodging resistance. To enhance the genetic architecture of RPR, we combined selection mapping on populations developed by 15 cycles of divergent selection for high and low RPR with time-course transcriptomic and metabolic analyses of the stalks. Divergent selection significantly altered allele frequencies of 3,656 and 3,412 single- nucleotide polymorphisms (SNPs) in the high and low RPR populations, respectively. Surprisingly, only 110 (1.56%) SNPs under selection were common in both populations, while the majority (98.4%) were unique to each population. This result indicated that high and low RPR phenotypes are produced by biologically distinct mechanisms. Remarkably, regions harboring lignin and polysaccharide genes were preferentially selected in high and low RPR populations, respectively. The preferential selection was manifested as higher lignification and increased saccharification of the high and low RPR stalks, respectively. The evolution of distinct gene classes according to the direction of selection was unexpected in the context of parallel evolution and demonstrated that selection for a trait, albeit in different directions, does not necessarily act on the same genes. Tricin, a grass-specific monolignol that initiates the incorporation of lignin in the cell walls, emerged as a key determinant of RPR. Integration of selection mapping and transcriptomic analyses with published genetic studies of RPR identified several candidate genes including ZmMYB31, ZmNAC25, ZmMADS1, ZmEXPA2, ZmIAA41 and hk5. These findings provide a foundation for an enhanced understanding of RPR and the improvement of stalk lodging resistance.


Asunto(s)
Zea mays/genética , Pared Celular/metabolismo , Evolución Molecular , Perfilación de la Expresión Génica , Frecuencia de los Genes , Metabolómica , Polimorfismo de Nucleótido Simple/genética , Carácter Cuantitativo Heredable , Zea mays/anatomía & histología
5.
BMC Plant Biol ; 19(1): 412, 2019 Oct 08.
Artículo en Inglés | MEDLINE | ID: mdl-31590656

RESUMEN

BACKGROUND: Genome wide association studies (GWAS) are a powerful tool for identifying quantitative trait loci (QTL) and causal single nucleotide polymorphisms (SNPs)/genes associated with various important traits in crop species. Typically, GWAS in crops are performed using a panel of inbred lines, where multiple replicates of the same inbred are measured and the average phenotype is taken as the response variable. Here we describe and evaluate single plant GWAS (sp-GWAS) for performing a GWAS on individual plants, which does not require an association panel of inbreds. Instead sp-GWAS relies on the phenotypes and genotypes from individual plants sampled from a randomly mating population. Importantly, we demonstrate how sp-GWAS can be efficiently combined with a bulk segregant analysis (BSA) experiment to rapidly corroborate evidence for significant SNPs. RESULTS: In this study we used the Shoepeg maize landrace, collected as an open pollinating variety from a farm in Southern Missouri in the 1960's, to evaluate whether sp-GWAS coupled with BSA can efficiently and powerfully used to detect significant association of SNPs for plant height (PH). Plant were grown in 8 locations across two years and in total 768 individuals were genotyped and phenotyped for sp-GWAS. A total of 306 k polymorphic markers in 768 individuals evaluated via association analysis detected 25 significant SNPs (P ≤ 0.00001) for PH. The results from our single-plant GWAS were further validated by bulk segregant analysis (BSA) for PH. BSA sequencing was performed on the same population by selecting tall and short plants as separate bulks. This approach identified 37 genomic regions for plant height. Of the 25 significant SNPs from GWAS, the three most significant SNPs co-localize with regions identified by BSA. CONCLUSION: Overall, this study demonstrates that sp-GWAS coupled with BSA can be a useful tool for detecting significant SNPs and identifying candidate genes. This result is particularly useful for species/populations where association panels are not readily available.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Polimorfismo de Nucleótido Simple/genética , Zea mays/genética , Cromosomas de las Plantas/genética , Genoma de Planta/genética , Desequilibrio de Ligamiento/genética , Sitios de Carácter Cuantitativo/genética
6.
Theor Appl Genet ; 128(3): 529-38, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25575839

RESUMEN

KEY MESSAGE: Natural variation for the timing of vegetative phase change in maize is controlled by several large effect loci, one corresponding to Glossy15 , a gene known for regulating juvenile tissue traits. Vegetative phase change is an intrinsic component of developmental programs in plants. Juvenile and adult vegetative tissues in grasses differ dramatically in their anatomical and biochemical composition affecting the utility of specific genotypes as animal feed and biofuel feedstock. The molecular network controlling the process of developmental transition is incompletely characterized. In this study, we used scoring for juvenile and adult epicuticular wax as an entry point to discover quantitative trait loci (QTL) controlling phenotypic variation for the developmental timing of juvenile to adult transition in maize. We scored the last leaf with juvenile wax on 25 recombinant inbred line families of the B73 reference Nested Association Mapping (NAM) population and the intermated B73×Mo17 (IBM) population across multiple seasons. A total of 13 unique QTL were identified through genome-wide association analysis across the NAM populations, three of which have large effects. A QTL located on chromosome nine had the most significant SNPs within Glossy15, a gene controlling expression of juvenile leaf traits. The second large effect QTL is located on chromosome two. The most significant SNP in this QTL is located adjacent to a homolog of the Arabidopsis transcription factor, enhanced downy mildew-2, which has been shown to promote the transition from juvenile to adult vegetative phase. Overall, these results show that several major QTL and potential candidate genes underlie the extensive natural variation for this developmental trait.


Asunto(s)
Mapeo Cromosómico , Sitios de Carácter Cuantitativo , Zea mays/genética , Genes de Plantas , Estudios de Asociación Genética , Variación Genética , Fenotipo , Polimorfismo de Nucleótido Simple , Zea mays/crecimiento & desarrollo
7.
Genet Sel Evol ; 47: 30, 2015 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-25928167

RESUMEN

BACKGROUND: High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome. RESULTS: Simulations applying this method were performed to identify selection signatures from pooled sequencing FST data, for which allele frequencies were estimated from a pool of individuals. The relative ratio of true to false positives was twice that generated by existing techniques. A comparison of the approach to a previous study that involved pooled sequencing FST data from maize suggested that outlying windows were more clearly separated from their neighbors than when using a standard sliding window approach. CONCLUSIONS: We have developed a novel technique to identify window boundaries for subsequent analysis protocols. When applied to selection studies based on F ST data, this method provides a high discovery rate and minimizes false positives. The method is implemented in the R package GenWin, which is publicly available from CRAN.


Asunto(s)
Genómica/métodos , Interpretación Estadística de Datos , Frecuencia de los Genes , Zea mays/genética
8.
G3 (Bethesda) ; 13(2)2023 02 09.
Artículo en Inglés | MEDLINE | ID: mdl-36454082

RESUMEN

Identifying selection on polygenic complex traits in crops and livestock is important for understanding evolution and helps prioritize important characteristics for breeding. Quantitative trait loci (QTL) that contribute to polygenic trait variation often exhibit small or infinitesimal effects. This hinders the ability to detect QTL-controlling polygenic traits because enormously high statistical power is needed for their detection. Recently, we circumvented this challenge by introducing a method to identify selection on complex traits by evaluating the relationship between genome-wide changes in allele frequency and estimates of effect size. The approach involves calculating a composite statistic across all markers that capture this relationship, followed by implementing a linkage disequilibrium-aware permutation test to evaluate if the observed pattern differs from that expected due to drift during evolution and population stratification. In this manuscript, we describe "Ghat," an R package developed to implement this method to test for selection on polygenic traits. We demonstrate the package by applying it to test for polygenic selection on 15 published European wheat traits including yield, biomass, quality, morphological characteristics, and disease resistance traits. Moreover, we applied Ghat to different simulated populations with different breeding histories and genetic architectures. The results highlight the power of Ghat to identify selection on complex traits. The Ghat package is accessible on CRAN, the Comprehensive R Archival Network, and on GitHub.


Asunto(s)
Herencia Multifactorial , Fitomejoramiento , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo , Desequilibrio de Ligamiento , Frecuencia de los Genes , Fenotipo
9.
G3 (Bethesda) ; 13(4)2023 04 11.
Artículo en Inglés | MEDLINE | ID: mdl-36625555

RESUMEN

Accurate prediction of the phenotypic outcomes produced by different combinations of genotypes, environments, and management interventions remains a key goal in biology with direct applications to agriculture, research, and conservation. The past decades have seen an expansion of new methods applied toward this goal. Here we predict maize yield using deep neural networks, compare the efficacy of 2 model development methods, and contextualize model performance using conventional linear and machine learning models. We examine the usefulness of incorporating interactions between disparate data types. We find deep learning and best linear unbiased predictor (BLUP) models with interactions had the best overall performance. BLUP models achieved the lowest average error, but deep learning models performed more consistently with similar average error. Optimizing deep neural network submodules for each data type improved model performance relative to optimizing the whole model for all data types at once. Examining the effect of interactions in the best-performing model revealed that including interactions altered the model's sensitivity to weather and management features, including a reduction of the importance scores for timepoints expected to have a limited physiological basis for influencing yield-those at the extreme end of the season, nearly 200 days post planting. Based on these results, deep learning provides a promising avenue for the phenotypic prediction of complex traits in complex environments and a potential mechanism to better understand the influence of environmental and genetic factors.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Aprendizaje Automático , Genotipo , Herencia Multifactorial
10.
BMC Res Notes ; 16(1): 148, 2023 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-37461058

RESUMEN

OBJECTIVES: The Genomes to Fields (G2F) 2022 Maize Genotype by Environment (GxE) Prediction Competition aimed to develop models for predicting grain yield for the 2022 Maize GxE project field trials, leveraging the datasets previously generated by this project and other publicly available data. DATA DESCRIPTION: This resource used data from the Maize GxE project within the G2F Initiative [1]. The dataset included phenotypic and genotypic data of the hybrids evaluated in 45 locations from 2014 to 2022. Also, soil, weather, environmental covariates data and metadata information for all environments (combination of year and location). Competitors also had access to ReadMe files which described all the files provided. The Maize GxE is a collaborative project and all the data generated becomes publicly available [2]. The dataset used in the 2022 Prediction Competition was curated and lightly filtered for quality and to ensure naming uniformity across years.


Asunto(s)
Genoma de Planta , Zea mays , Fenotipo , Zea mays/genética , Genotipo , Genoma de Planta/genética , Grano Comestible/genética
11.
BMC Genom Data ; 24(1): 29, 2023 05 25.
Artículo en Inglés | MEDLINE | ID: mdl-37231352

RESUMEN

OBJECTIVES: This report provides information about the public release of the 2018-2019 Maize G X E project of the Genomes to Fields (G2F) Initiative datasets. G2F is an umbrella initiative that evaluates maize hybrids and inbred lines across multiple environments and makes available phenotypic, genotypic, environmental, and metadata information. The initiative understands the necessity to characterize and deploy public sources of genetic diversity to face the challenges for more sustainable agriculture in the context of variable environmental conditions. DATA DESCRIPTION: Datasets include phenotypic, climatic, and soil measurements, metadata information, and inbred genotypic information for each combination of location and year. Collaborators in the G2F initiative collected data for each location and year; members of the group responsible for coordination and data processing combined all the collected information and removed obvious erroneous data. The collaborators received the data before the DOI release to verify and declare that the data generated in their own locations was accurate. ReadMe and description files are available for each dataset. Previous years of evaluation are already publicly available, with common hybrids present to connect across all locations and years evaluated since this project's inception.


Asunto(s)
Genoma de Planta , Zea mays , Fenotipo , Zea mays/genética , Estaciones del Año , Genotipo , Genoma de Planta/genética
12.
BMC Res Notes ; 16(1): 219, 2023 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-37710302

RESUMEN

OBJECTIVES: This release note describes the Maize GxE project datasets within the Genomes to Fields (G2F) Initiative. The Maize GxE project aims to understand genotype by environment (GxE) interactions and use the information collected to improve resource allocation efficiency and increase genotype predictability and stability, particularly in scenarios of variable environmental patterns. Hybrids and inbreds are evaluated across multiple environments and phenotypic, genotypic, environmental, and metadata information are made publicly available. DATA DESCRIPTION: The datasets include phenotypic data of the hybrids and inbreds evaluated in 30 locations across the US and one location in Germany in 2020 and 2021, soil and climatic measurements and metadata information for all environments (combination of year and location), ReadMe, and description files for each data type. A set of common hybrids is present in each environment to connect with previous evaluations. Each environment had a collaborator responsible for collecting and submitting the data, the GxE coordination team combined all the collected information and removed obvious erroneous data. Collaborators received the combined data to use, verify and declare that the data generated in their own environments was accurate. Combined data is released to the public with minimal filtering to maintain fidelity to the original data.


Asunto(s)
Asignación de Recursos , Zea mays , Zea mays/genética , Estaciones del Año , Genotipo , Alemania
13.
Genet Sel Evol ; 44: 29, 2012 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-23009363

RESUMEN

BACKGROUND: Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. RESULTS: Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. CONCLUSIONS: Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.


Asunto(s)
Animales Domésticos/genética , Cruzamiento/métodos , Modelos Genéticos , Animales , Teorema de Bayes , Cadenas de Markov , Método de Montecarlo
14.
Plant Genome ; 15(4): e20257, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36258672

RESUMEN

Low-density genotyping followed by imputation reduces genotyping costs while still providing high-density marker information. An increased marker density has the potential to improve the outcome of all applications that are based on genomic data. This study investigates techniques for 1k to 20k genomic marker imputation for plant breeding programs with sugar beet (Beta vulgaris L. ssp. vulgaris) as an example crop, where these are realistic marker numbers for modern breeding applications. The generally accepted 'gold standard' for imputation, Beagle 5.1, was compared with the recently developed software AlphaPlantImpute2 which is designed specifically for plant breeding. For Beagle 5.1 and AlphaPlantImpute2, the imputation strategy as well as the imputation parameters were optimized in this study. We found that the imputation accuracy of Beagle could be tremendously improved (0.22 to 0.67) by tuning parameters, mainly by lowering the values for the parameter for the effective population size and increasing the number of iterations performed. Separating the phasing and imputation steps also improved accuracies when optimized parameters were used (0.67 to 0.82). We also found that the imputation accuracy of Beagle decreased when more low-density lines were included for imputation. AlphaPlantImpute2 produced very high accuracies without optimization (0.89) and was generally less responsive to optimization. Overall, AlphaPlantImpute2 performed relatively better for imputation whereas Beagle was better for phasing. Combining both tools yielded the highest accuracies.


Asunto(s)
Beta vulgaris , Perros , Animales , Beta vulgaris/genética , Genotipo , Fitomejoramiento , Polimorfismo de Nucleótido Simple , Azúcares
15.
G3 (Bethesda) ; 12(11)2022 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-36124944

RESUMEN

We introduce the R-package learnMET, developed as a flexible framework to enable a collection of analyses on multi-environment trial breeding data with machine learning-based models. learnMET allows the combination of genomic information with environmental data such as climate and/or soil characteristics. Notably, the package offers the possibility of incorporating weather data from field weather stations, or to retrieve global meteorological datasets from a NASA database. Daily weather data can be aggregated over specific periods of time based on naive (for instance, nonoverlapping 10-day windows) or phenological approaches. Different machine learning methods for genomic prediction are implemented, including gradient-boosted decision trees, random forests, stacked ensemble models, and multilayer perceptrons. These prediction models can be evaluated via a collection of cross-validation schemes that mimic typical scenarios encountered by plant breeders working with multi-environment trial experimental data in a user-friendly way. The package is published under an MIT license and accessible on GitHub.


Asunto(s)
Genómica , Aprendizaje Automático , Genómica/métodos , Redes Neurales de la Computación
16.
Front Plant Sci ; 12: 699589, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34880880

RESUMEN

The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.

17.
G3 (Bethesda) ; 10(11): 4227-4239, 2020 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-32978264

RESUMEN

Plant growth, development, and nutritional quality depends upon amino acid homeostasis, especially in seeds. However, our understanding of the underlying genetics influencing amino acid content and composition remains limited, with only a few candidate genes and quantitative trait loci identified to date. Improved knowledge of the genetics and biological processes that determine amino acid levels will enable researchers to use this information for plant breeding and biological discovery. Toward this goal, we used genomic prediction to identify biological processes that are associated with, and therefore potentially influence, free amino acid (FAA) composition in seeds of the model plant Arabidopsis thaliana Markers were split into categories based on metabolic pathway annotations and fit using a genomic partitioning model to evaluate the influence of each pathway on heritability explained, model fit, and predictive ability. Selected pathways included processes known to influence FAA composition, albeit to an unknown degree, and spanned four categories: amino acid, core, specialized, and protein metabolism. Using this approach, we identified associations for pathways containing known variants for FAA traits, in addition to finding new trait-pathway associations. Markers related to amino acid metabolism, which are directly involved in FAA regulation, improved predictive ability for branched chain amino acids and histidine. The use of genomic partitioning also revealed patterns across biochemical families, in which serine-derived FAAs were associated with protein related annotations and aromatic FAAs were associated with specialized metabolic pathways. Taken together, these findings provide evidence that genomic partitioning is a viable strategy to uncover the relative contributions of biological processes to FAA traits in seeds, offering a promising framework to guide hypothesis testing and narrow the search space for candidate genes.


Asunto(s)
Arabidopsis , Fenómenos Biológicos , Aminoácidos , Arabidopsis/genética , Genómica , Humanos , Fitomejoramiento , Semillas/genética
18.
Curr Opin Plant Biol ; 54: 93-100, 2020 04.
Artículo en Inglés | MEDLINE | ID: mdl-32325397

RESUMEN

Crop domestication is a fascinating area of study, as shown by a multitude of recent reviews. Coupled with the increasing availability of genomic and phenomic resources in numerous crop species, insights from evolutionary biology will enable a deeper understanding of the genetic architecture and short-term evolution of complex traits, which can be used to inform selection strategies. Future advances in crop improvement will rely on the integration of population genetics with plant breeding methodology, and the development of community resources to support research in a variety of crop life histories and reproductive strategies. We highlight recent advances related to the role of selective sweeps and demographic history in shaping genetic architecture, how these breakthroughs can inform selection strategies, and the application of precision gene editing to leverage these connections.


Asunto(s)
Domesticación , Fitomejoramiento , Cruzamiento , Edición Génica , Plantas/genética
19.
Front Plant Sci ; 10: 1794, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32158452

RESUMEN

Association mapping (AM) is a powerful tool for fine mapping complex trait variation down to nucleotide sequences by exploiting historical recombination events. A major problem in AM is controlling false positives that can arise from population structure and family relatedness. False positives are often controlled by incorporating covariates for structure and kinship in mixed linear models (MLM). These MLM-based methods are single locus models and can introduce false negatives due to over fitting of the model. In this study, eight different statistical models, ranging from single-locus to multilocus, were compared for AM for three traits differing in heritability in two crop species: soybean (Glycine max L.) and maize (Zea mays L.). Soybean and maize were chosen, in part, due to their highly differentiated rate of linkage disequilibrium (LD) decay, which can influence false positive and false negative rates. The fixed and random model circulating probability unification (FarmCPU) performed better than other models based on an analysis of Q-Q plots and on the identification of the known number of quantitative trait loci (QTLs) in a simulated data set. These results indicate that the FarmCPU controls both false positives and false negatives. Six qualitative traits in soybean with known published genomic positions were also used to compare these models, and results indicated that the FarmCPU consistently identified a single highly significant SNP closest to these known published genes. Multiple comparison adjustments (Bonferroni, false discovery rate, and positive false discovery rate) were compared for these models using a simulated trait having 60% heritability and 20 QTLs. Multiple comparison adjustments were overly conservative for MLM, CMLM, ECMLM, and MLMM and did not find any significant markers; in contrast, ANOVA, GLM, and SUPER models found an excessive number of markers, far more than 20 QTLs. The FarmCPU model, using less conservative methods (false discovery rate, and positive false discovery rate) identified 10 QTLs, which was closer to the simulated number of QTLs than the number found by other models.

20.
Plant Methods ; 13: 8, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28250803

RESUMEN

BACKGROUND: High-density marker panels and/or whole-genome sequencing, coupled with advanced phenotyping pipelines and sophisticated statistical methods, have dramatically increased our ability to generate lists of candidate genes or regions that are putatively associated with phenotypes or processes of interest. However, the speed with which we can validate genes, or even make reasonable biological interpretations about the principles underlying them, has not kept pace. A promising approach that runs parallel to explicitly validating individual genes is analyzing a set of genes together and assessing the biological similarities among them. This is often achieved via gene ontology analysis, a powerful tool that involves evaluating publicly available gene annotations. However, additional resources such as Medical Subject Headings (MeSH) can also be used to evaluate sets of genes to make biological interpretations. RESULTS: In this manuscript, we describe utilizing MeSH terms to make biological interpretations in maize. MeSH terms are assigned to PubMed-indexed manuscripts by the National Library of Medicine, and can be directly mapped to genes to develop gene annotations. Once mapped, these terms can be evaluated for enrichment in sets of genes or similarity between gene sets to provide biological insights. Here, we implement MeSH analyses in five maize datasets to demonstrate how MeSH can be leveraged by the maize and broader crop-genomics community. CONCLUSIONS: We demonstrate that MeSH terms can be effectively leveraged to generate hypotheses and make biological interpretations in maize, and we provide a pipeline that enables the use of MeSH terms in other plant species.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA