Pesquisa | Secretaria de Estado da Saúde

1.

BELMM: Bayesian model selection and random walk smoothing in time-series clustering.

Sarala, Olli; Pyhäjärvi, Tanja; Sillanpää, Mikko J.

Bioinformatics ; 39(11)2023 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-37963057

RESUMO

MOTIVATION: Due to advances in measuring technology, many new phenotype, gene expression, and other omics time-course datasets are now commonly available. Cluster analysis may provide useful information about the structure of such data. RESULTS: In this work, we propose BELMM (Bayesian Estimation of Latent Mixture Models): a flexible framework for analysing, clustering, and modelling time-series data in a Bayesian setting. The framework is built on mixture modelling: first, the mean curves of the mixture components are assumed to follow random walk smoothing priors. Second, we choose the most plausible model and the number of mixture components using the Reversible-jump Markov chain Monte Carlo. Last, we assign the individual time series into clusters based on the similarity to the cluster-specific trend curves determined by the latent random walk processes. We demonstrate the use of fast and slow implementations of our approach on both simulated and real time-series data using widely available software R, Stan, and CU-MSDSp. AVAILABILITY AND IMPLEMENTATION: The French mortality dataset is available at http://www.mortality.org, the Drosophila melanogaster embryogenesis gene expression data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121160. Details on our simulated datasets are available in the Supplementary Material, and R scripts and a detailed tutorial on GitHub at https://github.com/ollisa/BELMM. The software CU-MSDSp is available on GitHub at https://github.com/jtchavisIII/CU-MSDSp.

Assuntos

Drosophila melanogaster , Software , Animais , Teorema de Bayes , Drosophila melanogaster/genética , Fatores de Tempo , Análise por Conglomerados

2.

Genetic fine-mapping from summary data using a nonlocal prior improves the detection of multiple causal variants.

Karhunen, Ville; Launonen, Ilkka; Järvelin, Marjo-Riitta; Sebert, Sylvain; Sillanpää, Mikko J.

Bioinformatics ; 39(7)2023 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-37348543

RESUMO

MOTIVATION: Genome-wide association studies (GWAS) have been successful in identifying genomic loci associated with complex traits. Genetic fine-mapping aims to detect independent causal variants from the GWAS-identified loci, adjusting for linkage disequilibrium patterns. RESULTS: We present "FiniMOM" (fine-mapping using a product inverse-moment prior), a novel Bayesian fine-mapping method for summarized genetic associations. For causal effects, the method uses a nonlocal inverse-moment prior, which is a natural prior distribution to model non-null effects in finite samples. A beta-binomial prior is set for the number of causal variants, with a parameterization that can be used to control for potential misspecifications in the linkage disequilibrium reference. The results of simulations studies aimed to mimic a typical GWAS on circulating protein levels show improved credible set coverage and power of the proposed method over current state-of-the-art fine-mapping method SuSiE, especially in the case of multiple causal variants within a locus. AVAILABILITY AND IMPLEMENTATION: https://vkarhune.github.io/finimom/.

Assuntos

Estudo de Associação Genômica Ampla , Genômica , Estudo de Associação Genômica Ampla/métodos , Teorema de Bayes , Mapeamento Cromossômico/métodos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único

3.

Dissecting the genetic architecture of quantitative traits using genome-wide identity-by-descent sharing.

Fraimout, Antoine; Guillaume, Frédéric; Li, Zitong; Sillanpää, Mikko J; Rastas, Pasi; Merilä, Juha.

Mol Ecol ; 33(6): e17299, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38380534

RESUMO

Additive and dominance genetic variances underlying the expression of quantitative traits are important quantities for predicting short-term responses to selection, but they are notoriously challenging to estimate in most non-model wild populations. Specifically, large-sized or panmictic populations may be characterized by low variance in genetic relatedness among individuals which, in turn, can prevent accurate estimation of quantitative genetic parameters. We used estimates of genome-wide identity-by-descent (IBD) sharing from autosomal SNP loci to estimate quantitative genetic parameters for ecologically important traits in nine-spined sticklebacks (Pungitius pungitius) from a large, outbred population. Using empirical and simulated datasets, with varying sample sizes and pedigree complexity, we assessed the performance of different crossing schemes in estimating additive genetic variance and heritability for all traits. We found that low variance in relatedness characteristic of wild outbred populations with high migration rate can impair the estimation of quantitative genetic parameters and bias heritability estimates downwards. On the other hand, the use of a half-sib/full-sib design allowed precise estimation of genetic variance components and revealed significant additive variance and heritability for all measured traits, with negligible dominance contributions. Genome-partitioning and QTL mapping analyses revealed that most traits had a polygenic basis and were controlled by genes at multiple chromosomes. Furthermore, different QTL contributed to variation in the same traits in different populations suggesting heterogeneous underpinnings of parallel evolution at the phenotypic level. Our results provide important guidelines for future studies aimed at estimating adaptive potential in the wild, particularly for those conducted in outbred large-sized populations.

Assuntos

Genoma , Herança Multifatorial , Humanos , Genoma/genética , Mapeamento Cromossômico , Fenótipo , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética

4.

Age-dependent genetic architecture across ontogeny of body size in sticklebacks.

Fraimout, Antoine; Li, Zitong; Sillanpää, Mikko J; Merilä, Juha.

Proc Biol Sci ; 289(1975): 20220352, 2022 05 25.

Artigo em Inglês | MEDLINE | ID: mdl-35582807

RESUMO

Heritable variation in traits under natural selection is a prerequisite for evolutionary response. While it is recognized that trait heritability may vary spatially and temporally depending on which environmental conditions traits are expressed under, less is known about the possibility that genetic variance contributing to the expected selection response in a given trait may vary at different stages of ontogeny. Specifically, whether different loci underlie the expression of a trait throughout development and thus providing an additional source of variation for selection to act on in the wild, is unclear. Here we show that body size, an important life-history trait, is heritable throughout ontogeny in the nine-spined stickleback (Pungitius pungitius). Nevertheless, both analyses of quantitative trait loci and genetic correlations across ages show that different chromosomes/loci contribute to this heritability in different ontogenic time-points. This suggests that body size can respond to selection at different stages of ontogeny but that this response is determined by different loci at different points of development. Hence, our study provides important results regarding our understanding of the genetics of ontogeny and opens an interesting avenue of research for studying age-specific genetic architecture as a source of non-parallel evolution.

Assuntos

Smegmamorpha , Animais , Tamanho Corporal/genética , Variação Genética , Fenótipo , Locos de Características Quantitativas , Seleção Genética , Smegmamorpha/fisiologia

5.

MCPeSe: Monte Carlo penalty selection for graphical lasso.

Kuismin, Markku; Sillanpää, Mikko J.

Bioinformatics ; 37(5): 726-727, 2021 05 05.

Artigo em Inglês | MEDLINE | ID: mdl-32805018

RESUMO

MOTIVATION: Graphical lasso (Glasso) is a widely used tool for identifying gene regulatory networks in systems biology. However, its computational efficiency depends on the choice of regularization parameter (tuning parameter), and selecting this parameter can be highly time consuming. Although fully Bayesian implementations of Glasso alleviate this problem somewhat by specifying a priori distribution for the parameter, these approaches lack the scalability of their frequentist counterparts. RESULTS: Here, we present a new Monte Carlo Penalty Selection method (MCPeSe), a computationally efficient approach to regularization parameter selection for Glasso. MCPeSe combines the scalability and low computational cost of the frequentist Glasso with the ability to automatically choose the regularization by Bayesian Glasso modeling. MCPeSe provides a state-of-the-art 'tuning-free' model selection criterion for Glasso and allows exploration of the posterior probability distribution of the tuning parameter. AVAILABILITY AND IMPLEMENTATION: R source code of MCPeSe, a step by step example showing how to apply MCPeSe and a collection of scripts used to prepare the material in this article are publicly available at GitHub under GPL (https://github.com/markkukuismin/MCPeSe/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Biologia de Sistemas , Teorema de Bayes , Método de Monte Carlo , Probabilidade

6.

Model guided trait-specific co-expression network estimation as a new perspective for identifying molecular interactions and pathways.

Kontio, Juho A J; Pyhäjärvi, Tanja; Sillanpää, Mikko J.

PLoS Comput Biol ; 17(5): e1008960, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33939702

RESUMO

A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological fusion that cross-exploits the strengths of individual approaches via a built-in information-sharing mechanism. This fusion is theoretically based on certain trait-conditioned dependency patterns between two genes depending on their role in the underlying parametric model. Resulting trait-specific co-expression network estimation method 1) serves to enhance the interpretation of biological networks in a parametric sense, and 2) exploits the underlying parametric model itself in the estimation process. To also account for the substantial amount of intrinsic noise and collinearities, often entailed by expression data, a tailored co-expression measure is introduced along with this framework to alleviate related computational problems. A remarkable advance over the reference methods in simulated scenarios substantiate the method's high-efficiency. As proof-of-concept, this synergistic approach is successfully applied in survival analysis, with acute myeloid leukemia data, further highlighting the framework's versatility and broad practical relevance.

Assuntos

Regulação da Expressão Gênica , Algoritmos , Humanos , Leucemia Mieloide Aguda/genética , Estudo de Prova de Conceito , Biologia de Sistemas

7.

Estimation of dynamic SNP-heritability with Bayesian Gaussian process models.

Arjas, Arttu; Hauptmann, Andreas; Sillanpää, Mikko J.

Bioinformatics ; 36(12): 3795-3802, 2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32186692

RESUMO

MOTIVATION: Improved DNA technology has made it practical to estimate single-nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth- and development-related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty. RESULTS: We introduce a completely tuning-free Bayesian Gaussian process (GP)-based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo method which allows full uncertainty quantification. Several datasets are analysed and our results clearly illustrate that the 95% credible intervals of the proposed joint estimation method (which 'borrows strength' from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 software and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals. AVAILABILITY AND IMPLEMENTATION: The C++ implementation dynBGP and simulated data are available in GitHub: https://github.com/aarjas/dynBGP. The programmes can be run in R. Real datasets are available in QTL archive: https://phenome.jax.org/centers/QTLA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Polimorfismo de Nucleotídeo Único , Software , Teorema de Bayes , Humanos , Método de Monte Carlo , Distribuição Normal

8.

Dispersal in a house sparrow metapopulation: An integrative case study of genetic assignment calibrated with ecological data and pedigree information.

Saatoglu, Dilan; Niskanen, Alina K; Kuismin, Markku; Ranke, Peter S; Hagen, Ingerid J; Araya-Ajoy, Yimen G; Kvalnes, Thomas; Pärn, Henrik; Rønning, Bernt; Ringsby, Thor Harald; Saether, Bernt-Erik; Husby, Arild; Sillanpää, Mikko J; Jensen, Henrik.

Mol Ecol ; 30(19): 4740-4756, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34270821

RESUMO

Dispersal has a crucial role determining ecoevolutionary dynamics through both gene flow and population size regulation. However, to study dispersal and its consequences, one must distinguish immigrants from residents. Dispersers can be identified using telemetry, capture-mark-recapture (CMR) methods, or genetic assignment methods. All of these methods have disadvantages, such as high costs and substantial field efforts needed for telemetry and CMR surveys, and adequate genetic distance required in genetic assignment. In this study, we used genome-wide 200K Single Nucleotide Polymorphism data and two different genetic assignment approaches (GSI_SIM, Bayesian framework; BONE, network-based estimation) to identify the dispersers in a house sparrow (Passer domesticus) metapopulation sampled over 16 years. Our results showed higher assignment accuracy with BONE. Hence, we proceeded to diagnose potential sources of errors in the assignment results from the BONE method due to variation in levels of interpopulation genetic differentiation, intrapopulation genetic variation and sample size. We show that assignment accuracy is high even at low levels of genetic differentiation and that it increases with the proportion of a population that has been sampled. Finally, we highlight that dispersal studies integrating both ecological and genetic data provide robust assessments of the dispersal patterns in natural populations.

Assuntos

Pardais , Animais , Teorema de Bayes , Deriva Genética , Linhagem , Densidade Demográfica , Pardais/genética

9.

EDISON: An Edge-Native Method and Architecture for Distributed Interpolation.

Lovén, Lauri; Lähderanta, Tero; Ruha, Leena; Peltonen, Ella; Launonen, Ilkka; Sillanpää, Mikko J; Riekki, Jukka; Pirttikangas, Susanna.

Sensors (Basel) ; 21(7)2021 Mar 24.

Artigo em Inglês | MEDLINE | ID: mdl-33805187

RESUMO

Spatio-temporal interpolation provides estimates of observations in unobserved locations and time slots. In smart cities, interpolation helps to provide a fine-grained contextual and situational understanding of the urban environment, in terms of both short-term (e.g., weather, air quality, traffic) or long term (e.g., crime, demographics) spatio-temporal phenomena. Various initiatives improve spatio-temporal interpolation results by including additional data sources such as vehicle-fitted sensors, mobile phones, or micro weather stations of, for example, smart homes. However, the underlying computing paradigm in such initiatives is predominantly centralized, with all data collected and analyzed in the cloud. This solution is not scalable, as when the spatial and temporal density of sensor data grows, the required transmission bandwidth and computational capacity become unfeasible. To address the scaling problem, we propose EDISON: algorithms for distributed learning and inference, and an edge-native architecture for distributing spatio-temporal interpolation models, their computations, and the observed data vertically and horizontally between device, edge and cloud layers. We demonstrate EDISON functionality in a controlled, simulated spatio-temporal setup with 1 M artificial data points. While the main motivation of EDISON is the distribution of the heavy computations, the results show that EDISON also provides an improvement over alternative approaches, reaching at best a 10% smaller RMSE than a global interpolation and 6% smaller RMSE than a baseline distributed approach.

10.

Genome-wide association study identified novel candidate loci affecting wood formation in Norway spruce.

Baison, John; Vidalis, Amaryllis; Zhou, Linghua; Chen, Zhi-Qiang; Li, Zitong; Sillanpää, Mikko J; Bernhardsson, Carolina; Scofield, Douglas; Forsberg, Nils; Grahn, Thomas; Olsson, Lars; Karlsson, Bo; Wu, Harry; Ingvarsson, Pär K; Lundqvist, Sven-Olof; Niittylä, Totte; García-Gil, M Rosario.

Plant J ; 100(1): 83-100, 2019 10.

Artigo em Inglês | MEDLINE | ID: mdl-31166032

RESUMO

Norway spruce is a boreal forest tree species of significant ecological and economic importance. Hence there is a strong imperative to dissect the genetics underlying important wood quality traits in the species. We performed a functional genome-wide association study (GWAS) of 17 wood traits in Norway spruce using 178 101 single nucleotide polymorphisms (SNPs) generated from exome genotyping of 517 mother trees. The wood traits were defined using functional modelling of wood properties across annual growth rings. We applied a Least Absolute Shrinkage and Selection Operator (LASSO-based) association mapping method using a functional multilocus mapping approach that utilizes latent traits, with a stability selection probability method as the hypothesis testing approach to determine a significant quantitative trait locus. The analysis provided 52 significant SNPs from 39 candidate genes, including genes previously implicated in wood formation and tree growth in spruce and other species. Our study represents a multilocus GWAS for complex wood traits in Norway spruce. The results advance our understanding of the genetics influencing wood traits and identifies candidate genes for future functional studies.

Assuntos

Genes de Plantas/genética , Estudo de Associação Genômica Ampla/métodos , Picea/genética , Locos de Características Quantitativas/genética , Madeira/genética , Algoritmos , Genômica/métodos , Genótipo , Desequilíbrio de Ligação , Noruega , Fenótipo , Picea/classificação , Polimorfismo de Nucleotídeo Único , Madeira/classificação

11.

A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data.

Vanhatalo, Jarno; Li, Zitong; Sillanpää, Mikko J.

Bioinformatics ; 35(19): 3684-3692, 2019 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-30850830

RESUMO

MOTIVATION: Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametric approaches based on splines and wavelets. However, very limited choices of software tools are currently available for practical implementation of functional QTL mapping and variable selection. RESULTS: We propose a Bayesian Gaussian process (GP) approach for functional QTL mapping. We use GPs to model the continuously varying coefficients which describe how the effects of molecular markers on the quantitative trait are changing over time. We use an efficient gradient based algorithm to estimate the tuning parameters of GPs. Notably, the GP approach is directly applicable to the incomplete datasets having even larger than 50% missing data rate (among phenotypes). We further develop a stepwise algorithm to search through the model space in terms of genetic variants, and use a minimal increase of Bayesian posterior probability as a stopping rule to focus on only a small set of putative QTL. We also discuss the connection between GP and penalized B-splines and wavelets. On two simulated and three real datasets, our GP approach demonstrates great flexibility for modeling different types of phenotypic trajectories with low computational cost. The proposed model selection approach finds the most likely QTL reliably in tested datasets. AVAILABILITY AND IMPLEMENTATION: Software and simulated data are available as a MATLAB package 'GPQTLmapping', and they can be downloaded from GitHub (https://github.com/jpvanhat/GPQTLmapping). Real datasets used in case studies are publicly available at QTL Archive. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Modelos Genéticos , Locos de Características Quantitativas , Animais , Teorema de Bayes , Mapeamento Cromossômico , Camundongos , Fenótipo

12.

Effect of centralization on geographic accessibility of maternity hospitals in Finland.

Huotari, Tiina; Rusanen, Jarmo; Keistinen, Timo; Lähderanta, Tero; Ruha, Leena; Sillanpää, Mikko J; Antikainen, Harri.

BMC Health Serv Res ; 20(1): 337, 2020 Apr 21.

Artigo em Inglês | MEDLINE | ID: mdl-32316970

RESUMO

BACKGROUND: In the past two decades, the number of maternity hospitals in Finland has been reduced from 42 to 22. Notwithstanding the benefits of centralization for larger units in terms of increased safety, the closures will inevitably impair geographical accessibility of services. METHODS: This study aimed to employ a set of location-allocation methods to assess the potential impact on accessibility, should the number of maternity hospitals be reduced from 22 to 16. Accurate population grid data combined with road network and hospital facilities data is analyzed with three different location-allocation methods: straight, sequential and capacitated p-median. RESULTS: Depending on the method used to assess the impact of further reduction in the number of maternity hospitals, 0.6 to 2.7% of mothers would have more than a two-hour travel time to the nearest maternity hospital, while the corresponding figure is 0.5 in the current situation. The analyses highlight the areas where the number of births is low, but a maternity hospital is still important in terms of accessibility, and the areas where even one unit would be enough to take care of a considerable volume of births. CONCLUSIONS: Even if the reduction in the number of hospitals might not drastically harm accessibility at the level of the entire population, considerable changes in accessibility can occur for clients living close to a maternity hospital facing closure. As different location-allocation analyses can result in different configurations of hospitals, decision-makers should be aware of their differences to ensure adequate accessibility for clients, especially in remote, sparsely populated areas.

Assuntos

Serviços Centralizados no Hospital , Acessibilidade aos Serviços de Saúde , Maternidades , Criança , Pré-Escolar , Feminino , Finlândia , Reforma dos Serviços de Saúde , Fechamento de Instituições de Saúde , Humanos , Lactente , Sistemas de Informação , Gravidez , Viagem

13.

Analysis of phenotypic- and Estimated Breeding Values (EBV) to dissect the genetic architecture of complex traits in a Scots pine three-generation pedigree design.

Calleja-Rodriguez, Ainhoa; Li, Zitong; Hallingbäck, Henrik R; Sillanpää, Mikko J; Wu, Harry X; Abrahamsson, Sara; García-Gil, Maria Rosario.

J Theor Biol ; 462: 283-292, 2019 02 07.

Artigo em Inglês | MEDLINE | ID: mdl-30423305

RESUMO

In forest tree breeding, family-based Quantitative Trait Loci (QTL) studies are valuable as methods to dissect the complexity of a trait and as a source of candidate genes. In the field of conifer research, our study contributes to the evaluation of phenotypic and predicted breeding values for the identification of QTL linked to complex traits in a three-generation pedigree population in Scots pine (Pinus sylvestris L.). A total of 11 470 open pollinated F2-progeny trees established at three different locations, were measured for growth and adaptive traits. Breeding values were predicted for their 360 mothers, originating from a single cross of two grand-parents. A multilevel LASSO association analysis was conducted to detect QTL using genotypes of the mothers with the corresponding phenotypes and Estimated Breeding Values (EBV). Different levels of genotype-by-environment (Gâ¯×â¯E) effects among sites at different years, were detected for survival and height. Moderate-to-low narrow sense heritabilities and EBV accuracies were found for all traits and all sites. We identified 18 AFLPs and 12 SNPs to be associated with QTL for one or more traits. 62 QTL were significant with percentages of variance explained ranging from 1.7 to 18.9%. In those cases where the same marker was associated to a phenotypic or an ebvQTL, the ebvQTL always explained higher proportion of the variance, maybe due to the more accurate nature of Estimated Breeding Values (EBV). Two SNP-QTL showed pleiotropic effects for traits related with hardiness, seed, cone and flower production. Furthermore, we detected several QTL with significant effects across multiple ages, which could be considered as strong candidate loci for early selection. The lack of reproducibility of some QTL detected across sites may be due to environmental heterogeneity reflected by the genotype- and QTL-by-environment effects.

Assuntos

Cruzamento/métodos , Pinus sylvestris/genética , Locos de Características Quantitativas/genética , Interação Gene-Ambiente , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo Único

14.

Estimating multilevel regional variation in excess mortality of cancer patients using integrated nested Laplace approximation.

Seppä, Karri; Rue, Håvard; Hakulinen, Timo; Läärä, Esa; Sillanpää, Mikko J; Pitkäniemi, Janne.

Stat Med ; 38(5): 778-791, 2019 02 28.

Artigo em Inglês | MEDLINE | ID: mdl-30334278

RESUMO

Models of excess mortality with random effects were used to estimate regional variation in relative or net survival of cancer patients. Statistical inference for these models based on the Markov chain Monte Carlo (MCMC) methods is computationally intensive and, therefore, not feasible for routine analyses of cancer register data. This study assessed the performance of the integrated nested Laplace approximation (INLA) in monitoring regional variation in cancer survival. Poisson regression model of excess mortality including both spatially correlated and unstructured random effects was fitted to the data of patients diagnosed with ovarian and breast cancer in Finland during 1955-2014 with follow up from 1960 through 2014 by using the period approach with five-year calendar time windows. We estimated standard deviations associated with variation (i) between hospital districts and (ii) between municipalities within hospital districts. Posterior estimates based on the INLA approach were compared to those based on the MCMC simulation. The estimates of the variation parameters were similar between the two approaches. Variation within hospital districts dominated in the total variation between municipalities. In 2000-2014, the proportion of the average variation within hospital districts was 68% (95% posterior interval: 35%-93%) and 82% (60%-98%) out of the total variation in ovarian and breast cancer, respectively. In the estimation of regional variation, the INLA approach was accurate, fast, and easy to implement by using the R-INLA package.

Assuntos

Neoplasias da Mama/mortalidade , Demografia/estatística & dados numéricos , Modelos Estatísticos , Neoplasias Ovarianas/mortalidade , Análise de Pequenas Áreas , Análise de Sobrevida , Cidades/estatística & dados numéricos , Feminino , Finlândia , Hospitais/estatística & dados numéricos , Humanos , Distribuição de Poisson , Sistema de Registros

15.

Stochastic search variable selection based on two mixture components and continuous-scale weighting.

Rinta-Aho, Marko J; Sillanpää, Mikko J.

Biom J ; 61(3): 729-746, 2019 05.

Artigo em Inglês | MEDLINE | ID: mdl-30537402

RESUMO

Stochastic search variable selection (SSVS) is a Bayesian variable selection method that employs covariate-specific discrete indicator variables to select which covariates (e.g., molecular markers) are included in or excluded from the model. We present a new variant of SSVS where, instead of discrete indicator variables, we use continuous-scale weighting variables (which take also values between zero and one) to select covariates into the model. The improved model performance is shown and compared to standard SSVS using simulated and real quantitative trait locus mapping datasets. The decision making to decide phenotype-genotype associations in our SSVS variant is based on median of posterior distribution or using Bayes factors. We also show here that by using continuous-scale weighting variables it is possible to improve mixing properties of Markov chain Monte Carlo sampling substantially compared to standard SSVS. Also, the separation of association signals and nonsignals (control of noise level) seems to be more efficient compared to the standard SSVS. Thus, the novel method provides efficient new framework for SSVS analysis that additionally provides whole posterior distribution for pseudo-indicators which means more information and may help in decision making.

Assuntos

Biometria/métodos , Algoritmos , Teorema de Bayes , Tomada de Decisões , Cadeias de Markov , Método de Monte Carlo , Probabilidade , Software , Processos Estocásticos

16.

A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction.

Mathew, Boby; Léon, Jens; Sillanpää, Mikko J.

Heredity (Edinb) ; 120(4): 356-368, 2018 04.

Artigo em Inglês | MEDLINE | ID: mdl-29238077

RESUMO

Single nucleotide polymorphism (SNP)-heritability estimation is an important topic in several research fields, including animal, plant and human genetics, as well as in ecology. Linear mixed model estimation of SNP-heritability uses the structures of genomic relationships between individuals, which is constructed from genome-wide sets of SNP-markers that are generally weighted equally in their contributions. Proposed methods to handle dependence between SNPs include, "thinning" the marker set by linkage disequilibrium (LD)-pruning, the use of haplotype-tagging of SNPs, and LD-weighting of the SNP-contributions. For improved estimation, we propose a new conceptual framework for genomic relationship matrix, in which Mahalanobis distance-based LD-correction is used in a linear mixed model estimation of SNP-heritability. The superiority of the presented method is illustrated and compared to mixed-model analyses using a VanRaden genomic relationship matrix, a matrix used by GCTA and a matrix employing LD-weighting (as implemented in the LDAK software) in simulated (using real human, rice and cattle genotypes) and real (maize, rice and mice) datasets. Despite of the computational difficulties, our results suggest that by using the proposed method one can improve the accuracy of SNP-heritability estimates in datasets with high LD.

Assuntos

Genômica/métodos , Desequilíbrio de Ligação , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Animais , Bovinos , Genótipo , Humanos , Modelos Lineares , Camundongos , Oryza , Software , Zea mays

17.

ACEt: An R Package for Estimating Dynamic Heritability and Comparing Twin Models.

He, Liang; Pitkäniemi, Janne; Silventoinen, Karri; Sillanpää, Mikko J.

Behav Genet ; 47(6): 620-641, 2017 11.

Artigo em Inglês | MEDLINE | ID: mdl-28879484

RESUMO

Estimating dynamic effects of age on the genetic and environmental variance components in twin studies may contribute to the investigation of gene-environment interactions, and may provide more insights into more accurate and powerful estimation of heritability. Existing parametric models for estimating dynamic variance components suffer from various drawbacks such as limitation of predefined functions. We present ACEt, an R package for fast estimating dynamic variance components and heritability that may change with respect to age or other moderators. Building on the twin models using penalized splines, ACEt provides a unified framework to incorporate a class of ACE models, in which each component can be modeled independently and is not limited by a linear or quadratic function. We demonstrate that ACEt is robust against misspecification of the number of spline knots, and offers a refined resolution of dynamic behavior of the genetic and environmental components and thus a detailed estimation of age-specific heritability. Moreover, we develop resampling methods for testing twin models with different variance functions including splines, log-linearity and constancy, which can be easily employed to verify various model assumptions. We evaluated the type I error rate and statistical power of the proposed hypothesis testing procedures under various scenarios using simulated datasets. Potential numerical issues and computational cost were also assessed through simulations. We applied the ACEt package to a Finnish twin cohort to investigate age-specific heritability of body mass index and height. Our results show that the age-specific variance components of these two traits exhibited substantially different patterns despite of comparable estimates of heritability. In summary, the ACEt R package offers a useful tool for the exploration of age-dependent heritability and model comparison in twin studies.

Assuntos

Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Simulação por Computador , Meio Ambiente , Interação Gene-Ambiente , Humanos , Funções Verossimilhança , Modelos Genéticos , Característica Quantitativa Herdável , Software , Gêmeos/genética , Gêmeos/estatística & dados numéricos

18.

Prioritizing covariates in the planning of future studies in the meta-analytic framework.

Karvanen, Juha; Sillanpää, Mikko J.

Biom J ; 59(1): 110-125, 2017 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-27740692

RESUMO

Science can be seen as a sequential process where each new study augments evidence to the existing knowledge. To have the best prospects to make an impact in this process, a new study should be designed optimally taking into account the previous studies and other prior information. We propose a formal approach for the covariate prioritization, that is the decision about the covariates to be measured in a new study. The decision criteria can be based on conditional power, change of the p-value, change in lower confidence limit, Kullback-Leibler divergence, Bayes factors, Bayesian false discovery rate or difference between prior and posterior expectation. The criteria can be also used for decisions on the sample size. As an illustration, we consider covariate prioritization based on genome-wide association studies for C-reactive protein levels and make suggestions on the genes to be studied further.

Assuntos

Metanálise como Assunto , Projetos de Pesquisa/tendências , Teorema de Bayes , Proteína C-Reativa/genética , Estudo de Associação Genômica Ampla , Probabilidade , Tamanho da Amostra

19.

Hierarchical Bayesian model for rare variant association analysis integrating genotype uncertainty in human sequence data.

He, Liang; Pitkäniemi, Janne; Sarin, Antti-Pekka; Salomaa, Veikko; Sillanpää, Mikko J; Ripatti, Samuli.

Genet Epidemiol ; 39(2): 89-100, 2015 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-25395270

RESUMO

Next-generation sequencing (NGS) has led to the study of rare genetic variants, which possibly explain the missing heritability for complex diseases. Most existing methods for rare variant (RV) association detection do not account for the common presence of sequencing errors in NGS data. The errors can largely affect the power and perturb the accuracy of association tests due to rare observations of minor alleles. We developed a hierarchical Bayesian approach to estimate the association between RVs and complex diseases. Our integrated framework combines the misclassification probability with shrinkage-based Bayesian variable selection. It allows for flexibility in handling neutral and protective RVs with measurement error, and is robust enough for detecting causal RVs with a wide spectrum of minor allele frequency (MAF). Imputation uncertainty and MAF are incorporated into the integrated framework to achieve the optimal statistical power. We demonstrate that sequencing error does significantly affect the findings, and our proposed model can take advantage of it to improve statistical power in both simulated and real data. We further show that our model outperforms existing methods, such as sequence kernel association test (SKAT). Finally, we illustrate the behavior of the proposed method using a Finnish low-density lipoprotein cholesterol study, and show that it identifies an RV known as FH North Karelia in LDLR gene with three carriers in 1,155 individuals, which is missed by both SKAT and Granvil.

Assuntos

Variação Genética/genética , Genótipo , Incerteza , Alelos , Sequência de Bases , Teorema de Bayes , LDL-Colesterol/genética , Finlândia , Frequência do Gene , Predisposição Genética para Doença/genética , Heterozigoto , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Genéticos , Receptores de LDL/genética , Software

20.

Prediction of complex human diseases from pathway-focused candidate markers by joint estimation of marker effects: case of chronic fatigue syndrome.

Bhattacharjee, Madhuchhanda; Rajeevan, Mangalathu S; Sillanpää, Mikko J.

Hum Genomics ; 9: 8, 2015 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-26063326

RESUMO

BACKGROUND: The current practice of using only a few strongly associated genetic markers in regression models results in generally low power in prediction or accounting for heritability of complex human traits. PURPOSE: We illustrate here a Bayesian joint estimation of single nucleotide polymorphism (SNP) effects principle to improve prediction of phenotype status from pathway-focused sets of SNPs. Chronic fatigue syndrome (CFS), a complex disease of unknown etiology with no laboratory methods for diagnosis, was chosen to demonstrate the power of this Bayesian method. For CFS, such a genetic predictive model in combination with clinical evidence might lead to an earlier diagnosis than one based solely on clinical findings. METHODS: One of our goals is to model disease status using Bayesian statistics which perform variable selection and parameter estimation simultaneously and which can induce the sparseness and smoothness of the SNP effects. Smoothness of the SNP effects is obtained by explicit modeling of the covariance structure of the SNP effects. RESULTS: The Bayesian model achieved perfect goodness of fit when tested within the sampled data. Tenfold cross-validation resulted in 80% accuracy, one of the best so far for CFS in comparison to previous prediction models. Model reduction aspects were investigated in a computationally feasible manner. Additionally, genetic variation estimates provided by the model identified specific genetic markers for their biological role in the disease pathophysiology. CONCLUSIONS: This proof-of-principle study provides a powerful approach combining Bayesian methods, SNPs representing multiple pathways and rigorous case ascertainment for accurate genetic risk prediction modeling of complex diseases like CFS and other chronic diseases.

Assuntos

Vias Biossintéticas/genética , Síndrome de Fadiga Crônica/genética , Marcadores Genéticos , Modelos Genéticos , Adolescente , Adulto , Idoso , Teorema de Bayes , Síndrome de Fadiga Crônica/patologia , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa