Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
New Phytol ; 242(3): 947-959, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38509854

RESUMO

Many plant populations exhibit synchronous flowering, which can be advantageous in plant reproduction. However, molecular mechanisms underlying flowering synchrony remain poorly understood. We studied the role of known vernalization-response and flower-promoting pathways in facilitating synchronized flowering in Arabidopsis thaliana. Using the vernalization-responsive Col-FRI genotype, we experimentally varied germination dates and daylength among individuals to test flowering synchrony in field and controlled environments. We assessed the activity of flowering regulation pathways by measuring gene expression across leaves produced at different time points during development and through a mutant analysis. We observed flowering synchrony across germination cohorts in both environments and discovered a previously unknown process where flower-promoting and repressing signals are differentially regulated between leaves that developed under different environmental conditions. We hypothesized this mechanism may underlie synchronization. However, our experiments demonstrated that signals originating from sources other than leaves must also play a pivotal role in synchronizing flowering time, especially in germination cohorts with prolonged growth before vernalization. Our results suggest flowering synchrony is promoted by a plant-wide integration of flowering signals across leaves and among organs. To summarize our findings, we propose a new conceptual model of vernalization-induced flowering synchrony and provide suggestions for future research in this field.


Assuntos
Proteínas de Arabidopsis , Arabidopsis , Humanos , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Vernalização , Flores/fisiologia , Reprodução , Regulação da Expressão Gênica de Plantas , Proteínas de Domínio MADS/genética , Proteínas de Domínio MADS/metabolismo
2.
Int J Mol Sci ; 23(23)2022 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-36498886

RESUMO

Recent advances in maize doubled haploid (DH) technology have enabled the development of large numbers of DH lines quickly and efficiently. However, testing all possible hybrid crosses among DH lines is a challenge. Phenotyping haploid progenitors created during the DH process could accelerate the selection of DH lines. Based on phenotypic and genotypic data of a DH population and its corresponding haploids, we compared phenotypes and estimated genetic correlations between the two populations, compared genomic prediction accuracy of multi-trait models against conventional univariate models within the DH population, and evaluated whether incorporating phenotypic data from haploid lines into a multi-trait model could better predict performance of DH lines. We found significant phenotypic differences between DH and haploid lines for nearly all traits; however, their genetic correlations between populations were moderate to strong. Furthermore, a multi-trait model taking into account genetic correlations between traits in the single-environment trial or genetic covariances in multi-environment trials can significantly increase genomic prediction accuracy. However, integrating information of haploid lines did not further improve our prediction. Our findings highlight the superiority of multi-trait models in predicting performance of DH lines in maize breeding, but do not support the routine phenotyping and selection on haploid progenitors of DH lines.


Assuntos
Melhoramento Vegetal , Zea mays , Zea mays/genética , Haploidia , Fenótipo , Genótipo
3.
Mol Biol Evol ; 39(11)2022 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-36327321

RESUMO

Maize is a staple food of smallholder farmers living in highland regions up to 4,000 m above sea level worldwide. Mexican and South American highlands are two major highland maize growing regions, and population genetic data suggest the maize's adaptation to these regions occurred largely independently, providing a case study for convergent evolution. To better understand the mechanistic basis of highland adaptation, we crossed maize landraces from 108 highland and lowland sites of Mexico and South America with the inbred line B73 to produce F1 hybrids and grew them in both highland and lowland sites in Mexico. We identified thousands of genes with divergent expression between highland and lowland populations. Hundreds of these genes show patterns of convergent evolution between Mexico and South America. To dissect the genetic architecture of the divergent gene expression, we developed a novel allele-specific expression analysis pipeline to detect genes with divergent functional cis-regulatory variation between highland and lowland populations. We identified hundreds of genes with divergent cis-regulation between highland and lowland landrace alleles, with 20 in common between regions, further suggesting convergence in the genes underlying highland adaptation. Further analyses suggest multiple mechanisms contribute to this convergence in gene regulation. Although the vast majority of evolutionary changes associated with highland adaptation were region specific, our findings highlight an important role for convergence at the gene expression and gene regulation levels as well.


Assuntos
Adaptação Fisiológica , Zea mays , Zea mays/genética , Alelos , Adaptação Fisiológica/genética , Genética Populacional , Aclimatação
4.
Genetics ; 222(2)2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35961029

RESUMO

The interaction of evolutionary processes to determine quantitative genetic variation has implications for contemporary and future phenotypic evolution, as well as for our ability to detect causal genetic variants. While theoretical studies have provided robust predictions to discriminate among competing models, empirical assessment of these has been limited. In particular, theory highlights the importance of pleiotropy in resolving observations of selection and mutation, but empirical investigations have typically been limited to few traits. Here, we applied high-dimensional Bayesian Sparse Factor Genetic modeling to gene expression datasets in 2 species, Drosophila melanogaster and Drosophila serrata, to explore the distributions of genetic variance across high-dimensional phenotypic space. Surprisingly, most of the heritable trait covariation was due to few lines (genotypes) with extreme [>3 interquartile ranges (IQR) from the median] values. Intriguingly, while genotypes extreme for a multivariate factor also tended to have a higher proportion of individual traits that were extreme, we also observed genotypes that were extreme for multivariate factors but not for any individual trait. We observed other consistent differences between heritable multivariate factors with outlier lines vs those factors without extreme values, including differences in gene functions. We use these observations to identify further data required to advance our understanding of the evolutionary dynamics and nature of standing genetic variation for quantitative traits.


Assuntos
Drosophila , Modelos Genéticos , Animais , Teorema de Bayes , Drosophila/genética , Drosophila melanogaster/genética , Variação Genética , Fenótipo , Seleção Genética
5.
Evol Appl ; 15(5): 817-837, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-35603032

RESUMO

Populations are locally adapted when they exhibit higher fitness than foreign populations in their native habitat. Maize landrace adaptations to highland and lowland conditions are of interest to researchers and breeders. To determine the prevalence and strength of local adaptation in maize landraces, we performed a reciprocal transplant experiment across an elevational gradient in Mexico. We grew 120 landraces, grouped into four populations (Mexican Highland, Mexican Lowland, South American Highland, South American Lowland), in Mexican highland and lowland common gardens and collected phenotypes relevant to fitness and known highland-adaptive traits such as anthocyanin pigmentation and macrohair density. 67k DArTseq markers were generated from field specimens to allow comparisons between phenotypic patterns and population genetic structure. We found phenotypic patterns consistent with local adaptation, though these patterns differ between the Mexican and South American populations. Quantitative trait differentiation (Q ST) was greater than neutral allele frequency differentiation (F ST) for many traits, signaling directional selection between pairs of populations. All populations exhibited higher fitness metric values when grown at their native elevation, and Mexican landraces had higher fitness than South American landraces when grown in these Mexican sites. As environmental distance between landraces' native collection sites and common garden sites increased, fitness values dropped, suggesting landraces are adapted to environmental conditions at their natal sites. Correlations between fitness and anthocyanin pigmentation and macrohair traits were stronger in the highland site than the lowland site, supporting their status as highland-adaptive. These results give substance to the long-held presumption of local adaptation of New World maize landraces to elevation and other environmental variables across North and South America.

6.
G3 (Bethesda) ; 12(3)2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-35134181

RESUMO

Genotype-by-environment interactions are a significant challenge for crop breeding as well as being important for understanding the genetic basis of environmental adaptation. In this study, we analyzed genotype-by-environment interactions in a maize multiparent advanced generation intercross population grown across 5 environments. We found that genotype-by-environment interactions contributed as much as genotypic effects to the variation in some agronomically important traits. To understand how genetic correlations between traits change across environments, we estimated the genetic variance-covariance matrix in each environment. Changes in genetic covariances between traits across environments were common, even among traits that show low genotype-by-environment variance. We also performed a genome-wide association study to identify markers associated with genotype-by-environment interactions but found only a small number of significantly associated markers, possibly due to the highly polygenic nature of genotype-by-environment interactions in this population.


Assuntos
Estudo de Associação Genômica Ampla , Zea mays , Interação Gene-Ambiente , Genótipo , Fenótipo , Melhoramento Vegetal , Zea mays/genética
7.
G3 (Bethesda) ; 12(3)2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-35100382

RESUMO

The search for quantitative trait loci that explain complex traits such as yield and drought tolerance has been ongoing in all crops. Methods such as biparental quantitative trait loci mapping and genome-wide association studies each have their own advantages and limitations. Multiparent advanced generation intercross populations contain more recombination events and genetic diversity than biparental mapping populations and are better able to estimate effect sizes of rare alleles than association mapping populations. Here, we discuss the results of using a multiparent advanced generation intercross population of doubled haploid maize lines created from 16 diverse founders to perform quantitative trait loci mapping. We compare 3 models that assume bi-allelic, founder, and ancestral haplotype allelic states for quantitative trait loci. The 3 methods have differing power to detect quantitative trait loci for a variety of agronomic traits. Although the founder approach finds the most quantitative trait loci, all methods are able to find unique quantitative trait loci, suggesting that each model has advantages for traits with different genetic architectures. A closer look at a well-characterized flowering time quantitative trait loci, qDTA8, which contains vgt1, highlights the strengths and weaknesses of each method and suggests a potential epistatic interaction. Overall, our results reinforce the importance of considering different approaches to analyzing genotypic datasets, and shows the limitations of binary SNP data for identifying multiallelic quantitative trait loci.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Alelos , Mapeamento Cromossômico/métodos , Cruzamentos Genéticos
8.
Theor Appl Genet ; 134(12): 4043-4054, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34643760

RESUMO

KEY MESSAGE: Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly related populations in addition to the single-environment prediction. Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M, and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17, and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.


Assuntos
Avena/genética , Modelos Genéticos , Valor Nutritivo , Sementes/química , Avena/química , Marcadores Genéticos , Metaboloma , Fenótipo , Melhoramento Vegetal , Polimorfismo de Nucleotídeo Único , Transcriptoma
9.
G3 (Bethesda) ; 11(10)2021 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-34568924

RESUMO

Implementing genomic-based prediction models in genomic selection requires an understanding of the measures for evaluating prediction accuracy from different models and methods using multi-trait data. In this study, we compared prediction accuracy using six large multi-trait wheat data sets (quality and grain yield). The data were used to predict 1 year (testing) from the previous year (training) to assess prediction accuracy using four different prediction models. The results indicated that the conventional Pearson's correlation between observed and predicted values underestimated the true correlation value, whereas the corrected Pearson's correlation calculated by fitting a bivariate model was higher than the division of the Pearson's correlation by the squared root of the heritability across traits, by 2.53-11.46%. Across the datasets, the corrected Pearson's correlation was higher than the uncorrected by 5.80-14.01%. Overall, we found that for grain yield the prediction performance was highest using a multi-trait compared to a single-trait model. The higher the absolute genetic correlation between traits the greater the benefits of multi-trait models for increasing the genomic-enabled prediction accuracy of traits.


Assuntos
Melhoramento Vegetal , Triticum , Genômica , Genótipo , Modelos Genéticos , Fenótipo , Seleção Genética , Triticum/genética
12.
Genome Biol ; 22(1): 213, 2021 07 23.
Artigo em Inglês | MEDLINE | ID: mdl-34301310

RESUMO

Large-scale phenotype data can enhance the power of genomic prediction in plant and animal breeding, as well as human genetics. However, the statistical foundation of multi-trait genomic prediction is based on the multivariate linear mixed effect model, a tool notorious for its fragility when applied to more than a handful of traits. We present MegaLMM, a statistical framework and associated software package for mixed model analyses of a virtually unlimited number of traits. Using three examples with real plant data, we show that MegaLMM can leverage thousands of traits at once to significantly improve genetic value prediction accuracy.


Assuntos
Arabidopsis/genética , Genoma de Planta , Modelos Genéticos , Característica Quantitativa Herdável , Software , Triticum/genética , Zea mays/genética , Teorema de Bayes , Interação Gene-Ambiente , Genômica , Genótipo , Humanos , Fenótipo , Melhoramento Vegetal
13.
Cell ; 184(12): 3333-3348.e19, 2021 06 10.
Artigo em Inglês | MEDLINE | ID: mdl-34010619

RESUMO

Plant species have evolved myriads of solutions, including complex cell type development and regulation, to adapt to dynamic environments. To understand this cellular diversity, we profiled tomato root cell type translatomes. Using xylem differentiation in tomato, examples of functional innovation, repurposing, and conservation of transcription factors are described, relative to the model plant Arabidopsis. Repurposing and innovation of genes are further observed within an exodermis regulatory network and illustrate its function. Comparative translatome analyses of rice, tomato, and Arabidopsis cell populations suggest increased expression conservation of root meristems compared with other homologous populations. In addition, the functions of constitutively expressed genes are more conserved than those of cell type/tissue-enriched genes. These observations suggest that higher order properties of cell type and pan-cell type regulation are evolutionarily conserved between plants and animals.


Assuntos
Arabidopsis/genética , Genes de Plantas , Invenções , Raízes de Plantas/crescimento & desenvolvimento , Raízes de Plantas/genética , Solanum lycopersicum/genética , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Proteínas de Fluorescência Verde/metabolismo , Solanum lycopersicum/citologia , Meristema/metabolismo , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Raízes de Plantas/citologia , Regiões Promotoras Genéticas/genética , Biossíntese de Proteínas , Especificidade da Espécie , Fatores de Transcrição/metabolismo , Xilema/genética
14.
G3 (Bethesda) ; 11(3)2021 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-33772307

RESUMO

The widely recounted story of the origin of cultivated strawberry (Fragaria × ananassa) oversimplifies the complex interspecific hybrid ancestry of the highly admixed populations from which heirloom and modern cultivars have emerged. To develop deeper insights into the three-century-long domestication history of strawberry, we reconstructed the genealogy as deeply as possible-pedigree records were assembled for 8,851 individuals, including 2,656 cultivars developed since 1775. The parents of individuals with unverified or missing pedigree records were accurately identified by applying an exclusion analysis to array-genotyped single-nucleotide polymorphisms. We identified 187 wild octoploid and 1,171 F. × ananassa founders in the genealogy, from the earliest hybrids to modern cultivars. The pedigree networks for cultivated strawberry are exceedingly complex labyrinths of ancestral interconnections formed by diverse hybrid ancestry, directional selection, migration, admixture, bottlenecks, overlapping generations, and recurrent hybridization with common ancestors that have unequally contributed allelic diversity to heirloom and modern cultivars. Fifteen to 333 ancestors were predicted to have transmitted 90% of the alleles found in country-, region-, and continent-specific populations. Using parent-offspring edges in the global pedigree network, we found that selection cycle lengths over the past 200 years of breeding have been extraordinarily long (16.0-16.9 years/generation), but decreased to a present-day range of 6.0-10.0 years/generation. Our analyses uncovered conspicuous differences in the ancestry and structure of North American and European populations, and shed light on forces that have shaped phenotypic diversity in F. × ananassa.


Assuntos
Domesticação , Fragaria , Fragaria/genética , Hibridização Genética , Melhoramento Vegetal
15.
Ecol Evol ; 11(3): 1100-1110, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-33598117

RESUMO

Ecological restoration often requires translocating plant material from distant sites. Importing suitable plant material is important for successful establishment and persistence. Yet, published guidelines for seed transfer are available for very few species. Accurately predicting how transferred plants will perform requires multiyear and multi-environment field trials and comprehensive follow-up work, and is therefore infeasible given the number of species used in restoration programs. Alternative methods to predict the outcomes of seed transfer are valuable for species without published guidelines. In this study, we analyzed the genetic structure of an important shrub used in ecological restoration in the Southern Rocky Mountains called alder-leaf mountain mahogany (Cercocarpus montanus). We sequenced DNA from 1,440 plants in 48 populations across a broad geographic range. We found that genetic heterogeneity among populations reflected the complex climate and topography across which the species is distributed. We identified temperature and precipitation variables that were useful predictors of genetic differentiation and can be used to generate seed transfer recommendations. These results will be valuable for defining management and restoration practices for mountain mahogany.

16.
Proc Natl Acad Sci U S A ; 117(5): 2526-2534, 2020 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-31964817

RESUMO

The seasonal timing of seed germination determines a plant's realized environmental niche, and is important for adaptation to climate. The timing of seasonal germination depends on patterns of seed dormancy release or induction by cold and interacts with flowering-time variation to construct different seasonal life histories. To characterize the genetic basis and climatic associations of natural variation in seed chilling responses and associated life-history syndromes, we selected 559 fully sequenced accessions of the model annual species Arabidopsis thaliana from across a wide climate range and scored each for seed germination across a range of 13 cold stratification treatments, as well as the timing of flowering and senescence. Germination strategies varied continuously along 2 major axes: 1) Overall germination fraction and 2) induction vs. release of dormancy by cold. Natural variation in seed responses to chilling was correlated with flowering time and senescence to create a range of seasonal life-history syndromes. Genome-wide association identified several loci associated with natural variation in seed chilling responses, including a known functional polymorphism in the self-binding domain of the candidate gene DOG1. A phylogeny of DOG1 haplotypes revealed ancient divergence of these functional variants associated with periods of Pleistocene climate change, and Gradient Forest analysis showed that allele turnover of candidate SNPs was significantly associated with climate gradients. These results provide evidence that A. thaliana's germination niche and correlated life-history syndromes are shaped by past climate cycles, as well as local adaptation to contemporary climate.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Sementes/química , Alelos , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Proteínas de Arabidopsis/genética , Temperatura Baixa , Regulação da Expressão Gênica de Plantas , Germinação , Características de História de Vida , Polimorfismo Genético , Estações do Ano , Sementes/genética , Sementes/crescimento & desenvolvimento , Sementes/metabolismo
17.
Plant J ; 102(2): 383-397, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31797460

RESUMO

Understanding the impact of elevated CO2 (eCO2 ) in global agriculture is important given climate change projections. Breeding climate-resilient crops depends on genetic variation within naturally varying populations. The effect of genetic variation in response to eCO2 is poorly understood, especially in crop species. We describe the different ways in which Solanum lycopersicum and its wild relative S. pennellii respond to eCO2 , from cell anatomy, to the transcriptome, and metabolome. We further validate the importance of translational regulation as a potential mechanism for plants to adaptively respond to rising levels of atmospheric CO2 .


Assuntos
Dióxido de Carbono/metabolismo , Regulação da Expressão Gênica de Plantas , Biossíntese de Proteínas , Solanum/fisiologia , Transcriptoma , Biomassa , Mudança Climática , Produtos Agrícolas , Variação Genética , Metaboloma , Fotossíntese , Raízes de Plantas/anatomia & histologia , Raízes de Plantas/genética , Raízes de Plantas/crescimento & desenvolvimento , Raízes de Plantas/fisiologia , Polirribossomos , RNA Mensageiro/genética , RNA de Plantas/genética , Solanum/anatomia & histologia , Solanum/genética , Solanum/crescimento & desenvolvimento
18.
Proc Natl Acad Sci U S A ; 116(36): 17890-17899, 2019 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-31420516

RESUMO

Contrary to previous assumptions that most mutations are deleterious, there is increasing evidence for persistence of large-effect mutations in natural populations. A possible explanation for these observations is that mutant phenotypes and fitness may depend upon the specific environmental conditions to which a mutant is exposed. Here, we tested this hypothesis by growing large-effect flowering time mutants of Arabidopsis thaliana in multiple field sites and seasons to quantify their fitness effects in realistic natural conditions. By constructing environment-specific fitness landscapes based on flowering time and branching architecture, we observed that a subset of mutations increased fitness, but only in specific environments. These mutations increased fitness via different paths: through shifting flowering time, branching, or both. Branching was under stronger selection, but flowering time was more genetically variable, pointing to the importance of indirect selection on mutations through their pleiotropic effects on multiple phenotypes. Finally, mutations in hub genes with greater connectedness in their regulatory networks had greater effects on both phenotypes and fitness. Together, these findings indicate that large-effect mutations may persist in populations because they influence traits that are adaptive only under specific environmental conditions. Understanding their evolutionary dynamics therefore requires measuring their effects in multiple natural environments.


Assuntos
Adaptação Biológica , Arabidopsis/fisiologia , Flores/fisiologia , Mutação , Seleção Genética , Evolução Biológica , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Estudos de Associação Genética , Genótipo , Fenótipo , Estações do Ano , Transcriptoma
19.
PLoS Genet ; 15(2): e1007978, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30735486

RESUMO

Linear mixed effect models are powerful tools used to account for population structure in genome-wide association studies (GWASs) and estimate the genetic architecture of complex traits. However, fully-specified models are computationally demanding and common simplifications often lead to reduced power or biased inference. We describe Grid-LMM (https://github.com/deruncie/GridLMM), an extendable algorithm for repeatedly fitting complex linear models that account for multiple sources of heterogeneity, such as additive and non-additive genetic variance, spatial heterogeneity, and genotype-environment interactions. Grid-LMM can compute approximate (yet highly accurate) frequentist test statistics or Bayesian posterior summaries at a genome-wide scale in a fraction of the time compared to existing general-purpose methods. We apply Grid-LMM to two types of quantitative genetic analyses. The first is focused on accounting for spatial variability and non-additive genetic variance while scanning for QTL; and the second aims to identify gene expression traits affected by non-additive genetic variation. In both cases, modeling multiple sources of heterogeneity leads to new discoveries.


Assuntos
Algoritmos , Modelos Lineares , Modelos Genéticos , Animais , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Teorema de Bayes , Peso Corporal/genética , Simulação por Computador , Flores/genética , Flores/crescimento & desenvolvimento , Interação Gene-Ambiente , Marcadores Genéticos , Variação Genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Camundongos , Locos de Características Quantitativas
20.
Ann Appl Stat ; 13(2): 958-989, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32542104

RESUMO

The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Bayesian Gaussian process regression, but the methodological innovations apply to other "black box" methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and two real data association mapping studies, we show that applying RATE enables an explanation for this improved performance.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA