Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 35(20): 4045-4052, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30977782

RESUMO

MOTIVATION: Interaction between the genotype and the environment (G×E) has a strong impact on the yield of major crop plants. Although influential, taking G×E explicitly into account in plant breeding has remained difficult. Recently G×E has been predicted from environmental and genomic covariates, but existing works have not shown that generalization to new environments and years without access to in-season data is possible and practical applicability remains unclear. Using data from a Barley breeding programme in Finland, we construct an in silico experiment to study the viability of G×E prediction under practical constraints. RESULTS: We show that the response to the environment of a new generation of untested Barley cultivars can be predicted in new locations and years using genomic data, machine learning and historical weather observations for the new locations. Our results highlight the need for models of G×E: non-linear effects clearly dominate linear ones, and the interaction between the soil type and daily rain is identified as the main driver for G×E for Barley in Finland. Our study implies that genomic selection can be used to capture the yield potential in G×E effects for future growth seasons, providing a possible means to achieve yield improvements, needed for feeding the growing population. AVAILABILITY AND IMPLEMENTATION: The data accompanied by the method code (http://research.cs.aalto.fi/pml/software/gxe/bioinformatics_codes.zip) is available in the form of kernels to allow reproducing the results. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Modelos Genéticos , Interação Gene-Ambiente , Genótipo , Fenótipo , Tempo (Meteorologia)
2.
G3 (Bethesda) ; 8(1): 131-147, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29097376

RESUMO

In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.


Assuntos
Interação Gene-Ambiente , Genoma de Planta , Modelos Estatísticos , Melhoramento Vegetal/métodos , Característica Quantitativa Herdável , Triticum/genética , Zea mays/genética , Algoritmos , Produtos Agrícolas , Genótipo , Modelos Genéticos , Fenótipo , Ploidias , Polimorfismo de Nucleotídeo Único , Seleção Genética
3.
PLoS One ; 12(3): e0174399, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28350823

RESUMO

OBJECTIVES: Preeclampsia is divided into early-onset (delivery before 34 weeks of gestation) and late-onset (delivery at or after 34 weeks) subtypes, which may rise from different etiopathogenic backgrounds. Early-onset disease is associated with placental dysfunction. Late-onset disease develops predominantly due to metabolic disturbances, obesity, diabetes, lipid dysfunction, and inflammation, which affect endothelial function. Our aim was to use cluster analysis to investigate clinical factors predicting the onset and severity of preeclampsia in a cohort of women with known clinical risk factors. METHODS: We recruited 903 pregnant women with risk factors for preeclampsia at gestational weeks 12+0-13+6. Each individual outcome diagnosis was independently verified from medical records. We applied a Bayesian clustering algorithm to classify the study participants to clusters based on their particular risk factor combination. For each cluster, we computed the risk ratio of each disease outcome, relative to the risk in the general population. RESULTS: The risk of preeclampsia increased exponentially with respect to the number of risk factors. Our analysis revealed 25 number of clusters. Preeclampsia in a previous pregnancy (n = 138) increased the risk of preeclampsia 8.1 fold (95% confidence interval (CI) 5.7-11.2) compared to a general population of pregnant women. Having a small for gestational age infant (n = 57) in a previous pregnancy increased the risk of early-onset preeclampsia 17.5 fold (95%CI 2.1-60.5). Cluster of those two risk factors together (n = 21) increased the risk of severe preeclampsia to 23.8-fold (95%CI 5.1-60.6), intermediate onset (delivery between 34+0-36+6 weeks of gestation) to 25.1-fold (95%CI 3.1-79.9) and preterm preeclampsia (delivery before 37+0 weeks of gestation) to 16.4-fold (95%CI 2.0-52.4). Body mass index over 30 kg/m2 (n = 228) as a sole risk factor increased the risk of preeclampsia to 2.1-fold (95%CI 1.1-3.6). Together with preeclampsia in an earlier pregnancy the risk increased to 11.4 (95%CI 4.5-20.9). Chronic hypertension (n = 60) increased the risk of preeclampsia 5.3-fold (95%CI 2.4-9.8), of severe preeclampsia 22.2-fold (95%CI 9.9-41.0), and risk of early-onset preeclampsia 16.7-fold (95%CI 2.0-57.6). If a woman had chronic hypertension combined with obesity, gestational diabetes and earlier preeclampsia, the risk of term preeclampsia increased 4.8-fold (95%CI 0.1-21.7). Women with type 1 diabetes mellitus had a high risk of all subgroups of preeclampsia. CONCLUSION: The risk of preeclampsia increases exponentially with respect to the number of risk factors. Early-onset preeclampsia and severe preeclampsia have different risk profile from term preeclampsia.


Assuntos
Retardo do Crescimento Fetal/epidemiologia , Pré-Eclâmpsia/epidemiologia , Adulto , Teorema de Bayes , Análise por Conglomerados , Feminino , Retardo do Crescimento Fetal/prevenção & controle , Idade Gestacional , Humanos , Razão de Chances , Pré-Eclâmpsia/prevenção & controle , Gravidez , Fatores de Risco
4.
Bioinformatics ; 30(14): 2026-34, 2014 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-24665129

RESUMO

MOTIVATION: A typical genome-wide association study searches for associations between single nucleotide polymorphisms (SNPs) and a univariate phenotype. However, there is a growing interest to investigate associations between genomics data and multivariate phenotypes, for example, in gene expression or metabolomics studies. A common approach is to perform a univariate test between each genotype-phenotype pair, and then to apply a stringent significance cutoff to account for the large number of tests performed. However, this approach has limited ability to uncover dependencies involving multiple variables. Another trend in the current genetics is the investigation of the impact of rare variants on the phenotype, where the standard methods often fail owing to lack of power when the minor allele is present in only a limited number of individuals. RESULTS: We propose a new statistical approach based on Bayesian reduced rank regression to assess the impact of multiple SNPs on a high-dimensional phenotype. Because of the method's ability to combine information over multiple SNPs and phenotypes, it is particularly suitable for detecting associations involving rare variants. We demonstrate the potential of our method and compare it with alternatives using the Northern Finland Birth Cohort with 4702 individuals, for whom genome-wide SNP data along with lipoprotein profiles comprising 74 traits are available. We discovered two genes (XRCC4 and MTHFD2L) without previously reported associations, which replicated in a combined analysis of two additional cohorts: 2390 individuals from the Cardiovascular Risk in Young Finns study and 3659 individuals from the FINRISK study. AVAILABILITY AND IMPLEMENTATION: R-code freely available for download at http://users.ics.aalto.fi/pemartti/gene_metabolome/.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Metabolômica/métodos , Polimorfismo de Nucleotídeo Único , Adulto , Teorema de Bayes , Humanos , Lipoproteínas/sangue , Metaboloma , Análise Multivariada , Fenótipo , Análise de Regressão
5.
Stat Appl Genet Mol Biol ; 12(4): 413-31, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23759510

RESUMO

High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small compared to the dimensionality of the data.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Algoritmos , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único , Análise de Regressão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA