RESUMO
Although plants harbor a huge phytochemical diversity, only a fraction of plant metabolites is functionally characterized. In this work, we aimed to identify the genetic basis of metabolite functions during harsh environmental conditions in Arabidopsis thaliana. With machine learning algorithms we predicted stress-specific metabolomes for 23 (a)biotic stress phenotypes of 300 natural Arabidopsis accessions. The prediction models identified several aliphatic glucosinolates (GLSs) and their breakdown products to be implicated in responses to heat stress in siliques and herbivory by Western flower thrips, Frankliniella occidentalis. Bivariate GWA mapping of the metabolome predictions and their respective (a)biotic stress phenotype revealed genetic associations with MAM, AOP, and GS-OH, all three involved in aliphatic GSL biosynthesis. We, therefore, investigated thrips herbivory on AOP, MAM, and GS-OH loss-of-function and/or overexpression lines. Arabidopsis accessions with a combination of MAM2 and AOP3, leading to 3-hydroxypropyl dominance, suffered less from thrips feeding damage. The requirement of MAM2 for this effect could, however, not be confirmed with an introgression line of ecotypes Cvi and Ler, most likely due to other, unknown susceptibility factors in the Ler background. However, AOP2 and GS-OH, adding alkenyl or hydroxy-butenyl groups, respectively, did not have major effects on thrips feeding. Overall, this study illustrates the complex implications of aliphatic GSL diversity in plant responses to heat stress and a cell-content-feeding herbivore.
Assuntos
Arabidopsis , Glucosinolatos , Resposta ao Choque Térmico , Tisanópteros , Glucosinolatos/metabolismo , Arabidopsis/genética , Arabidopsis/fisiologia , Animais , Tisanópteros/fisiologia , Herbivoria , Metaboloma , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Fenótipo , Mapeamento CromossômicoRESUMO
In the past decades, genomic prediction has had a large impact on plant breeding. Given the current advances of high-throughput phenotyping and sequencing technologies, it is increasingly common to observe a large number of traits, in addition to the target trait of interest. This raises the important question whether these additional or "secondary" traits can be used to improve genomic prediction for the target trait. With only a small number of secondary traits, this is known to be the case, given sufficiently high heritabilities and genetic correlations. Here we focus on the more challenging situation with a large number of secondary traits, which is increasingly common since the arrival of high-throughput phenotyping. In this case, secondary traits are usually incorporated through additional relatedness matrices. This approach is however infeasible when secondary traits are not measured on the test set, and cannot distinguish between genetic and non-genetic correlations. An alternative direction is to extend the classical selection indices using penalized regression. So far, penalized selection indices have not been applied in a genomic prediction setting, and require plot-level data in order to reliably estimate genetic correlations. Here we aim to overcome these limitations, using two novel approaches. Our first approach relies on a dimension reduction of the secondary traits, using either penalized regression or random forests (LS-BLUP/RF-BLUP). We then compute the bivariate GBLUP with the dimension reduction as secondary trait. For simulated data (with available plot-level data), we also use bivariate GBLUP with the penalized selection index as secondary trait (SI-BLUP). In our second approach (GM-BLUP), we follow existing multi-kernel methods but replace secondary traits by their genomic predictions, with the advantage that genomic prediction is also possible when secondary traits are only measured on the training set. For most of our simulated data, SI-BLUP was most accurate, often closely followed by RF-BLUP or LS-BLUP. In real datasets, involving metabolites in Arabidopsis and transcriptomics in maize, no method could substantially improve over univariate prediction when secondary traits were only available on the training set. LS-BLUP and RF-BLUP were most accurate when secondary traits were available also for the test set.
RESUMO
Natural variation has become a prime resource to identify genetic variants that contribute to phenotypic variation. The regional mapping (RegMap) population is one of the most important populations for studying natural variation in Arabidopsis thaliana, and has been used in a large number of association studies and in studies on climatic adaptation. However, only 413 RegMap accessions have been completely sequenced, as part of the 1001 Genomes (1001G) Project, while the remaining 894 accessions have only been genotyped with the Affymetrix 250k chip. As a consequence, most association studies involving the RegMap are either restricted to the sequenced accessions, reducing power, or rely on a limited set of SNPs. Here we impute millions of SNPs to the 894 accessions that are exclusive to the RegMap, using the 1135 accessions of the 1001G Project as the reference panel. We assess imputation accuracy using a novel cross-validation scheme, which we show provides a more reliable measure of accuracy than existing methods. After filtering out low accuracy SNPs, we obtain high-quality genotypic information for 2029 accessions and 3 million markers. To illustrate the benefits of these imputed data, we reconducted genome-wide association studies on five stress-related traits and could identify novel candidate genes.