Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 39(11)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37963057

RESUMEN

MOTIVATION: Due to advances in measuring technology, many new phenotype, gene expression, and other omics time-course datasets are now commonly available. Cluster analysis may provide useful information about the structure of such data. RESULTS: In this work, we propose BELMM (Bayesian Estimation of Latent Mixture Models): a flexible framework for analysing, clustering, and modelling time-series data in a Bayesian setting. The framework is built on mixture modelling: first, the mean curves of the mixture components are assumed to follow random walk smoothing priors. Second, we choose the most plausible model and the number of mixture components using the Reversible-jump Markov chain Monte Carlo. Last, we assign the individual time series into clusters based on the similarity to the cluster-specific trend curves determined by the latent random walk processes. We demonstrate the use of fast and slow implementations of our approach on both simulated and real time-series data using widely available software R, Stan, and CU-MSDSp. AVAILABILITY AND IMPLEMENTATION: The French mortality dataset is available at http://www.mortality.org, the Drosophila melanogaster embryogenesis gene expression data at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121160. Details on our simulated datasets are available in the Supplementary Material, and R scripts and a detailed tutorial on GitHub at https://github.com/ollisa/BELMM. The software CU-MSDSp is available on GitHub at https://github.com/jtchavisIII/CU-MSDSp.


Asunto(s)
Drosophila melanogaster , Programas Informáticos , Animales , Teorema de Bayes , Drosophila melanogaster/genética , Factores de Tiempo , Análisis por Conglomerados
2.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37348543

RESUMEN

MOTIVATION: Genome-wide association studies (GWAS) have been successful in identifying genomic loci associated with complex traits. Genetic fine-mapping aims to detect independent causal variants from the GWAS-identified loci, adjusting for linkage disequilibrium patterns. RESULTS: We present "FiniMOM" (fine-mapping using a product inverse-moment prior), a novel Bayesian fine-mapping method for summarized genetic associations. For causal effects, the method uses a nonlocal inverse-moment prior, which is a natural prior distribution to model non-null effects in finite samples. A beta-binomial prior is set for the number of causal variants, with a parameterization that can be used to control for potential misspecifications in the linkage disequilibrium reference. The results of simulations studies aimed to mimic a typical GWAS on circulating protein levels show improved credible set coverage and power of the proposed method over current state-of-the-art fine-mapping method SuSiE, especially in the case of multiple causal variants within a locus. AVAILABILITY AND IMPLEMENTATION: https://vkarhune.github.io/finimom/.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genómica , Estudio de Asociación del Genoma Completo/métodos , Teorema de Bayes , Mapeo Cromosómico/métodos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple
3.
Mol Ecol ; 33(6): e17299, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38380534

RESUMEN

Additive and dominance genetic variances underlying the expression of quantitative traits are important quantities for predicting short-term responses to selection, but they are notoriously challenging to estimate in most non-model wild populations. Specifically, large-sized or panmictic populations may be characterized by low variance in genetic relatedness among individuals which, in turn, can prevent accurate estimation of quantitative genetic parameters. We used estimates of genome-wide identity-by-descent (IBD) sharing from autosomal SNP loci to estimate quantitative genetic parameters for ecologically important traits in nine-spined sticklebacks (Pungitius pungitius) from a large, outbred population. Using empirical and simulated datasets, with varying sample sizes and pedigree complexity, we assessed the performance of different crossing schemes in estimating additive genetic variance and heritability for all traits. We found that low variance in relatedness characteristic of wild outbred populations with high migration rate can impair the estimation of quantitative genetic parameters and bias heritability estimates downwards. On the other hand, the use of a half-sib/full-sib design allowed precise estimation of genetic variance components and revealed significant additive variance and heritability for all measured traits, with negligible dominance contributions. Genome-partitioning and QTL mapping analyses revealed that most traits had a polygenic basis and were controlled by genes at multiple chromosomes. Furthermore, different QTL contributed to variation in the same traits in different populations suggesting heterogeneous underpinnings of parallel evolution at the phenotypic level. Our results provide important guidelines for future studies aimed at estimating adaptive potential in the wild, particularly for those conducted in outbred large-sized populations.


Asunto(s)
Genoma , Herencia Multifactorial , Humanos , Genoma/genética , Mapeo Cromosómico , Fenotipo , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética
4.
Proc Biol Sci ; 289(1975): 20220352, 2022 05 25.
Artículo en Inglés | MEDLINE | ID: mdl-35582807

RESUMEN

Heritable variation in traits under natural selection is a prerequisite for evolutionary response. While it is recognized that trait heritability may vary spatially and temporally depending on which environmental conditions traits are expressed under, less is known about the possibility that genetic variance contributing to the expected selection response in a given trait may vary at different stages of ontogeny. Specifically, whether different loci underlie the expression of a trait throughout development and thus providing an additional source of variation for selection to act on in the wild, is unclear. Here we show that body size, an important life-history trait, is heritable throughout ontogeny in the nine-spined stickleback (Pungitius pungitius). Nevertheless, both analyses of quantitative trait loci and genetic correlations across ages show that different chromosomes/loci contribute to this heritability in different ontogenic time-points. This suggests that body size can respond to selection at different stages of ontogeny but that this response is determined by different loci at different points of development. Hence, our study provides important results regarding our understanding of the genetics of ontogeny and opens an interesting avenue of research for studying age-specific genetic architecture as a source of non-parallel evolution.


Asunto(s)
Smegmamorpha , Animales , Tamaño Corporal/genética , Variación Genética , Fenotipo , Sitios de Carácter Cuantitativo , Selección Genética , Smegmamorpha/fisiología
5.
Bioinformatics ; 37(5): 726-727, 2021 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-32805018

RESUMEN

MOTIVATION: Graphical lasso (Glasso) is a widely used tool for identifying gene regulatory networks in systems biology. However, its computational efficiency depends on the choice of regularization parameter (tuning parameter), and selecting this parameter can be highly time consuming. Although fully Bayesian implementations of Glasso alleviate this problem somewhat by specifying a priori distribution for the parameter, these approaches lack the scalability of their frequentist counterparts. RESULTS: Here, we present a new Monte Carlo Penalty Selection method (MCPeSe), a computationally efficient approach to regularization parameter selection for Glasso. MCPeSe combines the scalability and low computational cost of the frequentist Glasso with the ability to automatically choose the regularization by Bayesian Glasso modeling. MCPeSe provides a state-of-the-art 'tuning-free' model selection criterion for Glasso and allows exploration of the posterior probability distribution of the tuning parameter. AVAILABILITY AND IMPLEMENTATION: R source code of MCPeSe, a step by step example showing how to apply MCPeSe and a collection of scripts used to prepare the material in this article are publicly available at GitHub under GPL (https://github.com/markkukuismin/MCPeSe/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Biología de Sistemas , Teorema de Bayes , Método de Montecarlo , Probabilidad
6.
PLoS Comput Biol ; 17(5): e1008960, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33939702

RESUMEN

A wide variety of 1) parametric regression models and 2) co-expression networks have been developed for finding gene-by-gene interactions underlying complex traits from expression data. While both methodological schemes have their own well-known benefits, little is known about their synergistic potential. Our study introduces their methodological fusion that cross-exploits the strengths of individual approaches via a built-in information-sharing mechanism. This fusion is theoretically based on certain trait-conditioned dependency patterns between two genes depending on their role in the underlying parametric model. Resulting trait-specific co-expression network estimation method 1) serves to enhance the interpretation of biological networks in a parametric sense, and 2) exploits the underlying parametric model itself in the estimation process. To also account for the substantial amount of intrinsic noise and collinearities, often entailed by expression data, a tailored co-expression measure is introduced along with this framework to alleviate related computational problems. A remarkable advance over the reference methods in simulated scenarios substantiate the method's high-efficiency. As proof-of-concept, this synergistic approach is successfully applied in survival analysis, with acute myeloid leukemia data, further highlighting the framework's versatility and broad practical relevance.


Asunto(s)
Regulación de la Expresión Génica , Algoritmos , Humanos , Leucemia Mieloide Aguda/genética , Prueba de Estudio Conceptual , Biología de Sistemas
7.
J Stroke Cerebrovasc Dis ; 31(8): 106603, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35749938

RESUMEN

OBJECTIVES: Selected patients with acute ischemic stroke (AIS) caused by proximal middle cerebral artery (MCA) or internal carotid artery occlusion benefit from endovascular thrombectomy (EVT) in extended time window (6-24 h from last seen well) based on two landmark randomized controlled trials (RCTs) DAWN and DEFUSE-3. We evaluated patients' outcome in the real-life with the focus on adherence to protocol of the two RCTs. MATERIALS AND METHODS: We included consecutive patients with AIS (excluding basilar artery occlusions) referred to EVT in our stroke center in the extended time window between January 2018 and December 2019 and compared the outcome of patients who fulfilled criteria of the RCTs with those who did not. RESULTS: Of the total of 100 patients, 23 complied with RCT's criteria and 18 presented with minor non-adherence (lower NIHSS score or longer treatment delay), whereas 22 patients had large baseline ischemia (>1/3 MCA), 28 presented with M2 and more distal occlusions, and 9 patients did not undergo perfusion imaging prior to EVT. Good 3-month outcome (modified Rankin Scale 0-2) was observed in 54% of those who either met the RCT criteria or presented with lower NIHSS score or longer treatment delay, but only in 30% of M2 occlusions, and in none of the patients with large baseline ischemia. CONCLUSIONS: Our findings highlight the impact of mostly large baseline ischemia but also vessel status when selecting patients for EVT in the extended time window and emphasize the need for further data in these patient subgroups.


Asunto(s)
Isquemia Encefálica , Procedimientos Endovasculares , Accidente Cerebrovascular Isquémico , Accidente Cerebrovascular , Isquemia Encefálica/diagnóstico por imagen , Isquemia Encefálica/terapia , Procedimientos Endovasculares/efectos adversos , Procedimientos Endovasculares/métodos , Humanos , Accidente Cerebrovascular Isquémico/diagnóstico por imagen , Accidente Cerebrovascular Isquémico/terapia , Accidente Cerebrovascular/diagnóstico por imagen , Accidente Cerebrovascular/terapia , Trombectomía/efectos adversos , Trombectomía/métodos , Resultado del Tratamiento
8.
Bioinformatics ; 36(12): 3795-3802, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32186692

RESUMEN

MOTIVATION: Improved DNA technology has made it practical to estimate single-nucleotide polymorphism (SNP)-heritability among distantly related individuals with unknown relationships. For growth- and development-related traits, it is meaningful to base SNP-heritability estimation on longitudinal data due to the time-dependency of the process. However, only few statistical methods have been developed so far for estimating dynamic SNP-heritability and quantifying its full uncertainty. RESULTS: We introduce a completely tuning-free Bayesian Gaussian process (GP)-based approach for estimating dynamic variance components and heritability as their function. For parameter estimation, we use a modern Markov Chain Monte Carlo method which allows full uncertainty quantification. Several datasets are analysed and our results clearly illustrate that the 95% credible intervals of the proposed joint estimation method (which 'borrows strength' from adjacent time points) are significantly narrower than of a two-stage baseline method that first estimates the variance components at each time point independently and then performs smoothing. We compare the method with a random regression model using MTG2 and BLUPF90 software and quantitative measures indicate superior performance of our method. Results are presented for simulated and real data with up to 1000 time points. Finally, we demonstrate scalability of the proposed method for simulated data with tens of thousands of individuals. AVAILABILITY AND IMPLEMENTATION: The C++ implementation dynBGP and simulated data are available in GitHub: https://github.com/aarjas/dynBGP. The programmes can be run in R. Real datasets are available in QTL archive: https://phenome.jax.org/centers/QTLA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Polimorfismo de Nucleótido Simple , Programas Informáticos , Teorema de Bayes , Humanos , Método de Montecarlo , Distribución Normal
9.
Mol Ecol ; 30(19): 4740-4756, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34270821

RESUMEN

Dispersal has a crucial role determining ecoevolutionary dynamics through both gene flow and population size regulation. However, to study dispersal and its consequences, one must distinguish immigrants from residents. Dispersers can be identified using telemetry, capture-mark-recapture (CMR) methods, or genetic assignment methods. All of these methods have disadvantages, such as high costs and substantial field efforts needed for telemetry and CMR surveys, and adequate genetic distance required in genetic assignment. In this study, we used genome-wide 200K Single Nucleotide Polymorphism data and two different genetic assignment approaches (GSI_SIM, Bayesian framework; BONE, network-based estimation) to identify the dispersers in a house sparrow (Passer domesticus) metapopulation sampled over 16 years. Our results showed higher assignment accuracy with BONE. Hence, we proceeded to diagnose potential sources of errors in the assignment results from the BONE method due to variation in levels of interpopulation genetic differentiation, intrapopulation genetic variation and sample size. We show that assignment accuracy is high even at low levels of genetic differentiation and that it increases with the proportion of a population that has been sampled. Finally, we highlight that dispersal studies integrating both ecological and genetic data provide robust assessments of the dispersal patterns in natural populations.


Asunto(s)
Gorriones , Animales , Teorema de Bayes , Flujo Genético , Linaje , Densidad de Población , Gorriones/genética
10.
Sensors (Basel) ; 21(7)2021 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-33805187

RESUMEN

Spatio-temporal interpolation provides estimates of observations in unobserved locations and time slots. In smart cities, interpolation helps to provide a fine-grained contextual and situational understanding of the urban environment, in terms of both short-term (e.g., weather, air quality, traffic) or long term (e.g., crime, demographics) spatio-temporal phenomena. Various initiatives improve spatio-temporal interpolation results by including additional data sources such as vehicle-fitted sensors, mobile phones, or micro weather stations of, for example, smart homes. However, the underlying computing paradigm in such initiatives is predominantly centralized, with all data collected and analyzed in the cloud. This solution is not scalable, as when the spatial and temporal density of sensor data grows, the required transmission bandwidth and computational capacity become unfeasible. To address the scaling problem, we propose EDISON: algorithms for distributed learning and inference, and an edge-native architecture for distributing spatio-temporal interpolation models, their computations, and the observed data vertically and horizontally between device, edge and cloud layers. We demonstrate EDISON functionality in a controlled, simulated spatio-temporal setup with 1 M artificial data points. While the main motivation of EDISON is the distribution of the heavy computations, the results show that EDISON also provides an improvement over alternative approaches, reaching at best a 10% smaller RMSE than a global interpolation and 6% smaller RMSE than a baseline distributed approach.

11.
Plant J ; 100(1): 83-100, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31166032

RESUMEN

Norway spruce is a boreal forest tree species of significant ecological and economic importance. Hence there is a strong imperative to dissect the genetics underlying important wood quality traits in the species. We performed a functional genome-wide association study (GWAS) of 17 wood traits in Norway spruce using 178 101 single nucleotide polymorphisms (SNPs) generated from exome genotyping of 517 mother trees. The wood traits were defined using functional modelling of wood properties across annual growth rings. We applied a Least Absolute Shrinkage and Selection Operator (LASSO-based) association mapping method using a functional multilocus mapping approach that utilizes latent traits, with a stability selection probability method as the hypothesis testing approach to determine a significant quantitative trait locus. The analysis provided 52 significant SNPs from 39 candidate genes, including genes previously implicated in wood formation and tree growth in spruce and other species. Our study represents a multilocus GWAS for complex wood traits in Norway spruce. The results advance our understanding of the genetics influencing wood traits and identifies candidate genes for future functional studies.


Asunto(s)
Genes de Plantas/genética , Estudio de Asociación del Genoma Completo/métodos , Picea/genética , Sitios de Carácter Cuantitativo/genética , Madera/genética , Algoritmos , Genómica/métodos , Genotipo , Desequilibrio de Ligamiento , Noruega , Fenotipo , Picea/clasificación , Polimorfismo de Nucleótido Simple , Madera/clasificación
12.
Bioinformatics ; 35(19): 3684-3692, 2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-30850830

RESUMEN

MOTIVATION: Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametric approaches based on splines and wavelets. However, very limited choices of software tools are currently available for practical implementation of functional QTL mapping and variable selection. RESULTS: We propose a Bayesian Gaussian process (GP) approach for functional QTL mapping. We use GPs to model the continuously varying coefficients which describe how the effects of molecular markers on the quantitative trait are changing over time. We use an efficient gradient based algorithm to estimate the tuning parameters of GPs. Notably, the GP approach is directly applicable to the incomplete datasets having even larger than 50% missing data rate (among phenotypes). We further develop a stepwise algorithm to search through the model space in terms of genetic variants, and use a minimal increase of Bayesian posterior probability as a stopping rule to focus on only a small set of putative QTL. We also discuss the connection between GP and penalized B-splines and wavelets. On two simulated and three real datasets, our GP approach demonstrates great flexibility for modeling different types of phenotypic trajectories with low computational cost. The proposed model selection approach finds the most likely QTL reliably in tested datasets. AVAILABILITY AND IMPLEMENTATION: Software and simulated data are available as a MATLAB package 'GPQTLmapping', and they can be downloaded from GitHub (https://github.com/jpvanhat/GPQTLmapping). Real datasets used in case studies are publicly available at QTL Archive. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Modelos Genéticos , Sitios de Carácter Cuantitativo , Animales , Teorema de Bayes , Mapeo Cromosómico , Ratones , Fenotipo
13.
BMC Health Serv Res ; 20(1): 337, 2020 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-32316970

RESUMEN

BACKGROUND: In the past two decades, the number of maternity hospitals in Finland has been reduced from 42 to 22. Notwithstanding the benefits of centralization for larger units in terms of increased safety, the closures will inevitably impair geographical accessibility of services. METHODS: This study aimed to employ a set of location-allocation methods to assess the potential impact on accessibility, should the number of maternity hospitals be reduced from 22 to 16. Accurate population grid data combined with road network and hospital facilities data is analyzed with three different location-allocation methods: straight, sequential and capacitated p-median. RESULTS: Depending on the method used to assess the impact of further reduction in the number of maternity hospitals, 0.6 to 2.7% of mothers would have more than a two-hour travel time to the nearest maternity hospital, while the corresponding figure is 0.5 in the current situation. The analyses highlight the areas where the number of births is low, but a maternity hospital is still important in terms of accessibility, and the areas where even one unit would be enough to take care of a considerable volume of births. CONCLUSIONS: Even if the reduction in the number of hospitals might not drastically harm accessibility at the level of the entire population, considerable changes in accessibility can occur for clients living close to a maternity hospital facing closure. As different location-allocation analyses can result in different configurations of hospitals, decision-makers should be aware of their differences to ensure adequate accessibility for clients, especially in remote, sparsely populated areas.


Asunto(s)
Servicios Centralizados de Hospital , Accesibilidad a los Servicios de Salud , Maternidades , Niño , Preescolar , Femenino , Finlandia , Reforma de la Atención de Salud , Clausura de las Instituciones de Salud , Humanos , Lactante , Sistemas de Información , Embarazo , Viaje
14.
J Theor Biol ; 462: 283-292, 2019 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-30423305

RESUMEN

In forest tree breeding, family-based Quantitative Trait Loci (QTL) studies are valuable as methods to dissect the complexity of a trait and as a source of candidate genes. In the field of conifer research, our study contributes to the evaluation of phenotypic and predicted breeding values for the identification of QTL linked to complex traits in a three-generation pedigree population in Scots pine (Pinus sylvestris L.). A total of 11 470 open pollinated F2-progeny trees established at three different locations, were measured for growth and adaptive traits. Breeding values were predicted for their 360 mothers, originating from a single cross of two grand-parents. A multilevel LASSO association analysis was conducted to detect QTL using genotypes of the mothers with the corresponding phenotypes and Estimated Breeding Values (EBV). Different levels of genotype-by-environment (G × E) effects among sites at different years, were detected for survival and height. Moderate-to-low narrow sense heritabilities and EBV accuracies were found for all traits and all sites. We identified 18 AFLPs and 12 SNPs to be associated with QTL for one or more traits. 62 QTL were significant with percentages of variance explained ranging from 1.7 to 18.9%. In those cases where the same marker was associated to a phenotypic or an ebvQTL, the ebvQTL always explained higher proportion of the variance, maybe due to the more accurate nature of Estimated Breeding Values (EBV). Two SNP-QTL showed pleiotropic effects for traits related with hardiness, seed, cone and flower production. Furthermore, we detected several QTL with significant effects across multiple ages, which could be considered as strong candidate loci for early selection. The lack of reproducibility of some QTL detected across sites may be due to environmental heterogeneity reflected by the genotype- and QTL-by-environment effects.


Asunto(s)
Cruzamiento/métodos , Pinus sylvestris/genética , Sitios de Carácter Cuantitativo/genética , Interacción Gen-Ambiente , Linaje , Fenotipo , Polimorfismo de Nucleótido Simple
15.
Stat Med ; 38(5): 778-791, 2019 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-30334278

RESUMEN

Models of excess mortality with random effects were used to estimate regional variation in relative or net survival of cancer patients. Statistical inference for these models based on the Markov chain Monte Carlo (MCMC) methods is computationally intensive and, therefore, not feasible for routine analyses of cancer register data. This study assessed the performance of the integrated nested Laplace approximation (INLA) in monitoring regional variation in cancer survival. Poisson regression model of excess mortality including both spatially correlated and unstructured random effects was fitted to the data of patients diagnosed with ovarian and breast cancer in Finland during 1955-2014 with follow up from 1960 through 2014 by using the period approach with five-year calendar time windows. We estimated standard deviations associated with variation (i) between hospital districts and (ii) between municipalities within hospital districts. Posterior estimates based on the INLA approach were compared to those based on the MCMC simulation. The estimates of the variation parameters were similar between the two approaches. Variation within hospital districts dominated in the total variation between municipalities. In 2000-2014, the proportion of the average variation within hospital districts was 68% (95% posterior interval: 35%-93%) and 82% (60%-98%) out of the total variation in ovarian and breast cancer, respectively. In the estimation of regional variation, the INLA approach was accurate, fast, and easy to implement by using the R-INLA package.


Asunto(s)
Neoplasias de la Mama/mortalidad , Demografía/estadística & datos numéricos , Modelos Estadísticos , Neoplasias Ováricas/mortalidad , Análisis de Área Pequeña , Análisis de Supervivencia , Ciudades/estadística & datos numéricos , Femenino , Finlandia , Hospitales/estadística & datos numéricos , Humanos , Distribución de Poisson , Sistema de Registros
16.
Biom J ; 61(3): 729-746, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30537402

RESUMEN

Stochastic search variable selection (SSVS) is a Bayesian variable selection method that employs covariate-specific discrete indicator variables to select which covariates (e.g., molecular markers) are included in or excluded from the model. We present a new variant of SSVS where, instead of discrete indicator variables, we use continuous-scale weighting variables (which take also values between zero and one) to select covariates into the model. The improved model performance is shown and compared to standard SSVS using simulated and real quantitative trait locus mapping datasets. The decision making to decide phenotype-genotype associations in our SSVS variant is based on median of posterior distribution or using Bayes factors. We also show here that by using continuous-scale weighting variables it is possible to improve mixing properties of Markov chain Monte Carlo sampling substantially compared to standard SSVS. Also, the separation of association signals and nonsignals (control of noise level) seems to be more efficient compared to the standard SSVS. Thus, the novel method provides efficient new framework for SSVS analysis that additionally provides whole posterior distribution for pseudo-indicators which means more information and may help in decision making.


Asunto(s)
Biometría/métodos , Algoritmos , Teorema de Bayes , Toma de Decisiones , Cadenas de Markov , Método de Montecarlo , Probabilidad , Programas Informáticos , Procesos Estocásticos
17.
Heredity (Edinb) ; 120(4): 356-368, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29238077

RESUMEN

Single nucleotide polymorphism (SNP)-heritability estimation is an important topic in several research fields, including animal, plant and human genetics, as well as in ecology. Linear mixed model estimation of SNP-heritability uses the structures of genomic relationships between individuals, which is constructed from genome-wide sets of SNP-markers that are generally weighted equally in their contributions. Proposed methods to handle dependence between SNPs include, "thinning" the marker set by linkage disequilibrium (LD)-pruning, the use of haplotype-tagging of SNPs, and LD-weighting of the SNP-contributions. For improved estimation, we propose a new conceptual framework for genomic relationship matrix, in which Mahalanobis distance-based LD-correction is used in a linear mixed model estimation of SNP-heritability. The superiority of the presented method is illustrated and compared to mixed-model analyses using a VanRaden genomic relationship matrix, a matrix used by GCTA and a matrix employing LD-weighting (as implemented in the LDAK software) in simulated (using real human, rice and cattle genotypes) and real (maize, rice and mice) datasets. Despite of the computational difficulties, our results suggest that by using the proposed method one can improve the accuracy of SNP-heritability estimates in datasets with high LD.


Asunto(s)
Genómica/métodos , Desequilibrio de Ligamiento , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Animales , Bovinos , Genotipo , Humanos , Modelos Lineales , Ratones , Oryza , Programas Informáticos , Zea mays
18.
Behav Genet ; 47(6): 620-641, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28879484

RESUMEN

Estimating dynamic effects of age on the genetic and environmental variance components in twin studies may contribute to the investigation of gene-environment interactions, and may provide more insights into more accurate and powerful estimation of heritability. Existing parametric models for estimating dynamic variance components suffer from various drawbacks such as limitation of predefined functions. We present ACEt, an R package for fast estimating dynamic variance components and heritability that may change with respect to age or other moderators. Building on the twin models using penalized splines, ACEt provides a unified framework to incorporate a class of ACE models, in which each component can be modeled independently and is not limited by a linear or quadratic function. We demonstrate that ACEt is robust against misspecification of the number of spline knots, and offers a refined resolution of dynamic behavior of the genetic and environmental components and thus a detailed estimation of age-specific heritability. Moreover, we develop resampling methods for testing twin models with different variance functions including splines, log-linearity and constancy, which can be easily employed to verify various model assumptions. We evaluated the type I error rate and statistical power of the proposed hypothesis testing procedures under various scenarios using simulated datasets. Potential numerical issues and computational cost were also assessed through simulations. We applied the ACEt package to a Finnish twin cohort to investigate age-specific heritability of body mass index and height. Our results show that the age-specific variance components of these two traits exhibited substantially different patterns despite of comparable estimates of heritability. In summary, the ACEt R package offers a useful tool for the exploration of age-dependent heritability and model comparison in twin studies.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Simulación por Computador , Ambiente , Interacción Gen-Ambiente , Humanos , Funciones de Verosimilitud , Modelos Genéticos , Carácter Cuantitativo Heredable , Programas Informáticos , Gemelos/genética , Gemelos/estadística & datos numéricos
19.
Biom J ; 59(1): 110-125, 2017 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-27740692

RESUMEN

Science can be seen as a sequential process where each new study augments evidence to the existing knowledge. To have the best prospects to make an impact in this process, a new study should be designed optimally taking into account the previous studies and other prior information. We propose a formal approach for the covariate prioritization, that is the decision about the covariates to be measured in a new study. The decision criteria can be based on conditional power, change of the p-value, change in lower confidence limit, Kullback-Leibler divergence, Bayes factors, Bayesian false discovery rate or difference between prior and posterior expectation. The criteria can be also used for decisions on the sample size. As an illustration, we consider covariate prioritization based on genome-wide association studies for C-reactive protein levels and make suggestions on the genes to be studied further.


Asunto(s)
Metaanálisis como Asunto , Proyectos de Investigación/tendencias , Teorema de Bayes , Proteína C-Reactiva/genética , Estudio de Asociación del Genoma Completo , Probabilidad , Tamaño de la Muestra
20.
Genet Epidemiol ; 39(2): 89-100, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25395270

RESUMEN

Next-generation sequencing (NGS) has led to the study of rare genetic variants, which possibly explain the missing heritability for complex diseases. Most existing methods for rare variant (RV) association detection do not account for the common presence of sequencing errors in NGS data. The errors can largely affect the power and perturb the accuracy of association tests due to rare observations of minor alleles. We developed a hierarchical Bayesian approach to estimate the association between RVs and complex diseases. Our integrated framework combines the misclassification probability with shrinkage-based Bayesian variable selection. It allows for flexibility in handling neutral and protective RVs with measurement error, and is robust enough for detecting causal RVs with a wide spectrum of minor allele frequency (MAF). Imputation uncertainty and MAF are incorporated into the integrated framework to achieve the optimal statistical power. We demonstrate that sequencing error does significantly affect the findings, and our proposed model can take advantage of it to improve statistical power in both simulated and real data. We further show that our model outperforms existing methods, such as sequence kernel association test (SKAT). Finally, we illustrate the behavior of the proposed method using a Finnish low-density lipoprotein cholesterol study, and show that it identifies an RV known as FH North Karelia in LDLR gene with three carriers in 1,155 individuals, which is missed by both SKAT and Granvil.


Asunto(s)
Variación Genética/genética , Genotipo , Incertidumbre , Alelos , Secuencia de Bases , Teorema de Bayes , LDL-Colesterol/genética , Finlandia , Frecuencia de los Genes , Predisposición Genética a la Enfermedad/genética , Heterocigoto , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Genéticos , Receptores de LDL/genética , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA