Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 39(10)2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37774002

RESUMO

MOTIVATION: Investigating cell differentiation under a genetic disorder offers the potential for improving current gene therapy strategies. Clonal tracking provides a basis for mathematical modelling of population stem cell dynamics that sustain the blood cell formation, a process known as haematopoiesis. However, many clonal tracking protocols rely on a subset of cell types for the characterization of the stem cell output, and the data generated are subject to measurement errors and noise. RESULTS: We propose a stochastic framework to infer dynamic models of cell differentiation from clonal tracking data. A state-space formulation combines a stochastic quasi-reaction network, describing cell differentiation, with a Gaussian measurement model accounting for data errors and noise. We developed an inference algorithm based on an extended Kalman filter, a nonlinear optimization, and a Rauch-Tung-Striebel smoother. Simulations show that our proposed method outperforms the state-of-the-art and scales to complex structures of cell differentiations in terms of nodes size and network depth. The application of our method to five in vivo gene therapy studies reveals different dynamics of cell differentiation. Our tool can provide statistical support to biologists and clinicians to better understand cell differentiation and haematopoietic reconstitution after a gene therapy treatment. The equations of the state-space model can be modified to infer other dynamics besides cell differentiation. AVAILABILITY AND IMPLEMENTATION: The stochastic framework is implemented in the R package Karen which is available for download at https://cran.r-project.org/package=Karen. The code that supports the findings of this study is openly available at https://github.com/delcore-luca/CellDifferentiationNetworks.


Assuntos
Algoritmos , Modelos Teóricos , Diferenciação Celular , Hematopoese/genética , Redes Reguladoras de Genes
2.
BMC Bioinformatics ; 24(1): 228, 2023 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-37268887

RESUMO

BACKGROUND: Mathematical models of haematopoiesis can provide insights on abnormal cell expansions (clonal dominance), and in turn can guide safety monitoring in gene therapy clinical applications. Clonal tracking is a recent high-throughput technology that can be used to quantify cells arising from a single haematopoietic stem cell ancestor after a gene therapy treatment. Thus, clonal tracking data can be used to calibrate the stochastic differential equations describing clonal population dynamics and hierarchical relationships in vivo. RESULTS: In this work we propose a random-effects stochastic framework that allows to investigate the presence of events of clonal dominance from high-dimensional clonal tracking data. Our framework is based on the combination between stochastic reaction networks and mixed-effects generalized linear models. Starting from the Kramers-Moyal approximated Master equation, the dynamics of cells duplication, death and differentiation at clonal level, can be described by a local linear approximation. The parameters of this formulation, which are inferred using a maximum likelihood approach, are assumed to be shared across the clones and are not sufficient to describe situation in which clones exhibit heterogeneity in their fitness that can lead to clonal dominance. In order to overcome this limitation, we extend the base model by introducing random-effects for the clonal parameters. This extended formulation is calibrated to the clonal data using a tailor-made expectation-maximization algorithm. We also provide the companion  package RestoreNet, publicly available for download at https://cran.r-project.org/package=RestoreNet . CONCLUSIONS: Simulation studies show that our proposed method outperforms the state-of-the-art. The application of our method in two in-vivo studies unveils the dynamics of clonal dominance. Our tool can provide statistical support to biologists in gene therapy safety analyses.


Assuntos
Algoritmos , Modelos Teóricos , Funções Verossimilhança , Simulação por Computador , Células Clonais , Processos Estocásticos
3.
Bioinformatics ; 38(22): 5049-5054, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-36179082

RESUMO

MOTIVATION: Gaussian graphical models (GGMs) are network representations of random variables (as nodes) and their partial correlations (as edges). GGMs overcome the challenges of high-dimensional data analysis by using shrinkage methodologies. Therefore, they have become useful to reconstruct gene regulatory networks from gene-expression profiles. However, it is often ignored that the partial correlations are 'shrunk' and that they cannot be compared/assessed directly. Therefore, accurate (differential) network analyses need to account for the number of variables, the sample size, and also the shrinkage value, otherwise, the analysis and its biological interpretation would turn biased. To date, there are no appropriate methods to account for these factors and address these issues. RESULTS: We derive the statistical properties of the partial correlation obtained with the Ledoit-Wolf shrinkage. Our result provides a toolbox for (differential) network analyses as (i) confidence intervals, (ii) a test for zero partial correlation (null-effects) and (iii) a test to compare partial correlations. Our novel (parametric) methods account for the number of variables, the sample size and the shrinkage values. Additionally, they are computationally fast, simple to implement and require only basic statistical knowledge. Our simulations show that the novel tests perform better than DiffNetFDR-a recently published alternative-in terms of the trade-off between true and false positives. The methods are demonstrated on synthetic data and two gene-expression datasets from Escherichia coli and Mus musculus. AVAILABILITY AND IMPLEMENTATION: The R package with the methods and the R script with the analysis are available in https://github.com/V-Bernal/GeneNetTools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Reguladoras de Genes , Camundongos , Animais , Distribuição Normal , Tamanho da Amostra , Expressão Gênica
4.
BMC Bioinformatics ; 22(Suppl 2): 196, 2021 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-33902443

RESUMO

BACKGROUND: Linear regression models are important tools for learning regulatory networks from gene expression time series. A conventional assumption for non-homogeneous regulatory processes on a short time scale is that the network structure stays constant across time, while the network parameters are time-dependent. The objective is then to learn the network structure along with changepoints that divide the time series into time segments. An uncoupled model learns the parameters separately for each segment, while a coupled model enforces the parameters of any segment to stay similar to those of the previous segment. In this paper, we propose a new consensus model that infers for each individual time segment whether it is coupled to (or uncoupled from) the previous segment. RESULTS: The results show that the new consensus model is superior to the uncoupled and the coupled model, as well as superior to a recently proposed generalized coupled model. CONCLUSIONS: The newly proposed model has the uncoupled and the coupled model as limiting cases, and it is able to infer the best trade-off between them from the data.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Teorema de Bayes , Modelos Lineares
5.
BMC Bioinformatics ; 22(1): 424, 2021 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-34493207

RESUMO

BACKGROUND: In systems biology, it is important to reconstruct regulatory networks from quantitative molecular profiles. Gaussian graphical models (GGMs) are one of the most popular methods to this end. A GGM consists of nodes (representing the transcripts, metabolites or proteins) inter-connected by edges (reflecting their partial correlations). Learning the edges from quantitative molecular profiles is statistically challenging, as there are usually fewer samples than nodes ('high dimensional problem'). Shrinkage methods address this issue by learning a regularized GGM. However, it remains open to study how the shrinkage affects the final result and its interpretation. RESULTS: We show that the shrinkage biases the partial correlation in a non-linear way. This bias does not only change the magnitudes of the partial correlations but also affects their order. Furthermore, it makes networks obtained from different experiments incomparable and hinders their biological interpretation. We propose a method, referred to as 'un-shrinking' the partial correlation, which corrects for this non-linear bias. Unlike traditional methods, which use a fixed shrinkage value, the new approach provides partial correlations that are closer to the actual (population) values and that are easier to interpret. This is demonstrated on two gene expression datasets from Escherichia coli and Mus musculus. CONCLUSIONS: GGMs are popular undirected graphical models based on partial correlations. The application of GGMs to reconstruct regulatory networks is commonly performed using shrinkage to overcome the 'high-dimensional problem'. Besides it advantages, we have identified that the shrinkage introduces a non-linear bias in the partial correlations. Ignoring this type of effects caused by the shrinkage can obscure the interpretation of the network, and impede the validation of earlier reported results.


Assuntos
Biologia de Sistemas , Animais , Camundongos , Distribuição Normal
6.
Bioinformatics ; 36(4): 1198-1207, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31504191

RESUMO

MOTIVATION: Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular tool for learning networks with time-varying interaction parameters. A multiple changepoint process is used to divide the data into disjoint segments and the network interaction parameters are assumed to be segment-specific. The objective is to infer the network structure along with the segmentation and the segment-specific parameters from the data. The conventional (uncoupled) NH-DBNs do not allow for information exchange among segments, and the interaction parameters have to be learned separately for each segment. More advanced coupled NH-DBN models allow the interaction parameters to vary but enforce them to stay similar over time. As the enforced similarity of the network parameters can have counter-productive effects, we propose a new consensus NH-DBN model that combines features of the uncoupled and the coupled NH-DBN. The new model infers for each individual edge whether its interaction parameter stays similar over time (and should be coupled) or if it changes from segment to segment (and should stay uncoupled). RESULTS: Our new model yields higher network reconstruction accuracies than state-of-the-art models for synthetic and yeast network data. For gene expression data from A.thaliana our new model infers a plausible network topology and yields hypotheses about the light-dependencies of the gene interactions. AVAILABILITY AND IMPLEMENTATION: Data are available from earlier publications. Matlab code is available at Bioinformatics online. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Teorema de Bayes , Perfilação da Expressão Gênica , Saccharomyces cerevisiae/genética
7.
Bioinformatics ; 35(12): 2108-2117, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30395165

RESUMO

MOTIVATION: Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular modelling tool for learning cellular networks from time series data. In systems biology, time series are often measured under different experimental conditions, and not rarely only some network interaction parameters depend on the condition while the other parameters stay constant across conditions. For this situation, we propose a new partially NH-DBN, based on Bayesian hierarchical regression models with partitioned design matrices. With regard to our main application to semi-quantitative (immunoblot) timecourse data from mammalian target of rapamycin complex 1 (mTORC1) signalling, we also propose a Gaussian process-based method to solve the problem of non-equidistant time series measurements. RESULTS: On synthetic network data and on yeast gene expression data the new model leads to improved network reconstruction accuracies. We then use the new model to reconstruct the topologies of the circadian clock network in Arabidopsis thaliana and the mTORC1 signalling pathway. The inferred network topologies show features that are consistent with the biological literature. AVAILABILITY AND IMPLEMENTATION: All datasets have been made available with earlier publications. Our Matlab code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Simulação por Computador , Algoritmos , Teorema de Bayes , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Distribuição Normal
8.
Bioinformatics ; 35(23): 5011-5017, 2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31077287

RESUMO

MOTIVATION: One of the main goals in systems biology is to learn molecular regulatory networks from quantitative profile data. In particular, Gaussian graphical models (GGMs) are widely used network models in bioinformatics where variables (e.g. transcripts, metabolites or proteins) are represented by nodes, and pairs of nodes are connected with an edge according to their partial correlation. Reconstructing a GGM from data is a challenging task when the sample size is smaller than the number of variables. The main problem consists in finding the inverse of the covariance estimator which is ill-conditioned in this case. Shrinkage-based covariance estimators are a popular approach, producing an invertible 'shrunk' covariance. However, a proper significance test for the 'shrunk' partial correlation (i.e. the GGM edges) is an open challenge as a probability density including the shrinkage is unknown. In this article, we present (i) a geometric reformulation of the shrinkage-based GGM, and (ii) a probability density that naturally includes the shrinkage parameter. RESULTS: Our results show that the inference using this new 'shrunk' probability density is as accurate as Monte Carlo estimation (an unbiased non-parametric method) for any shrinkage value, while being computationally more efficient. We show on synthetic data how the novel test for significance allows an accurate control of the Type I error and outperforms the network reconstruction obtained by the widely used R package GeneNet. This is further highlighted in two gene expression datasets from stress response in Eschericha coli, and the effect of influenza infection in Mus musculus. AVAILABILITY AND IMPLEMENTATION: https://github.com/V-Bernal/GGM-Shrinkage. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Animais , Camundongos , Método de Monte Carlo , Distribuição Normal , Biologia de Sistemas
9.
J Environ Manage ; 276: 111296, 2020 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-32906073

RESUMO

Drought is a complex natural hazard. It occurs due to a prolonged period of deficient in rainfall amount in a certain region. Unlike other natural hazards, drought hazard has a recurrent occurrence. Therefore, comprehensive drought monitoring is essential for regional climate control and water management authorities. In this paper, we have proposed a new drought indicator: the Seasonally Combinative Regional Drought Indicator (SCRDI). The SCRDI integrates Bayesian networking theory with Standardized Precipitation Temperature Index (SPTI) at varying gauge stations in various month/seasons. Application of SCRDI is based on five gauging stations of Northern Area of Pakistan. We have found that the proposed indicator accounts the effect of climate variation within a specified territory, accurately characterizes drought by capturing seasonal dependencies in geospatial variation scenario, and reduces the large/complex data for future drought monitoring. In summary, the proposed indicator can be used for comprehensive characterization and assessment of drought at a certain region.


Assuntos
Secas , Teorema de Bayes , Paquistão , Estações do Ano , Temperatura
10.
Comput Stat ; 32(2): 717-761, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-32103862

RESUMO

Thermodynamic integration (TI) for computing marginal likelihoods is based on an inverse annealing path from the prior to the posterior distribution. In many cases, the resulting estimator suffers from high variability, which particularly stems from the prior regime. When comparing complex models with differences in a comparatively small number of parameters, intrinsic errors from sampling fluctuations may outweigh the differences in the log marginal likelihood estimates. In the present article, we propose a TI scheme that directly targets the log Bayes factor. The method is based on a modified annealing path between the posterior distributions of the two models compared, which systematically avoids the high variance prior regime. We combine this scheme with the concept of non-equilibrium TI to minimise discretisation errors from numerical integration. Results obtained on Bayesian regression models applied to standard benchmark data, and a complex hierarchical model applied to biopathway inference, demonstrate a significant reduction in estimator variance over state-of-the-art TI methods.

11.
Stat Appl Genet Mol Biol ; 14(2): 143-67, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25719342

RESUMO

There has been much interest in reconstructing bi-directional regulatory networks linking the circadian clock to metabolism in plants. A variety of reverse engineering methods from machine learning and computational statistics have been proposed and evaluated. The emphasis of the present paper is on combining models in a model ensemble to boost the network reconstruction accuracy, and to explore various model combination strategies to maximize the improvement. Our results demonstrate that a rich ensemble of predictors outperforms the best individual model, even if the ensemble includes poor predictors with inferior individual reconstruction accuracy. For our application to metabolomic and transcriptomic time series from various mutagenesis plants grown in different light-dark cycles we also show how to determine the optimal time lag between interactions, and we identify significant interactions with a randomization test. Our study predicts new statistically significant interactions between circadian clock genes and metabolites in Arabidopsis thaliana, and thus provides independent statistical evidence that the regulation of metabolism by the circadian clock is not uni-directional, but that there is a statistically significant feedback mechanism aiming from metabolism back to the circadian clock.


Assuntos
Relógios Circadianos/genética , Metaboloma/genética , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Ritmo Circadiano/genética , Regulação da Expressão Gênica de Plantas/genética , Genes de Plantas/genética , Luz , Modelos Genéticos
12.
Stat Appl Genet Mol Biol ; 13(3): 227-73, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24864301

RESUMO

We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana.


Assuntos
Arabidopsis/genética , Arabidopsis/fisiologia , Ritmo Circadiano/genética , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Estatística como Assunto , Arabidopsis/efeitos da radiação , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Área Sob a Curva , Teorema de Bayes , Regulação da Expressão Gênica de Plantas/efeitos da radiação , Luz , Modelos Genéticos , Curva ROC , Análise de Regressão
13.
J Appl Stat ; 51(5): 845-865, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38524794

RESUMO

Statistical learning of the structures of cellular networks, such as protein signaling pathways, is a topical research field in computational systems biology. To get the most information out of experimental data, it is often required to develop a tailored statistical approach rather than applying one of the off-the-shelf network reconstruction methods. The focus of this paper is on learning the structure of the mTOR protein signaling pathway from immunoblotting protein phosphorylation data. Under two experimental conditions eleven phosphorylation sites of eight key proteins of the mTOR pathway were measured at ten non-equidistant time points. For the statistical analysis we propose a new advanced hierarchically coupled non-homogeneous dynamic Bayesian network (NH-DBN) model, and we consider various data imputation methods for dealing with non-equidistant temporal observations. Because of the absence of a true gold standard network, we propose to use predictive probabilities in combination with a leave-one-out cross validation strategy to objectively cross-compare the accuracies of different NH-DBN models and data imputation methods. Finally, we employ the best combination of model and data imputation method for predicting the structure of the mTOR protein signaling pathway.

14.
PLoS One ; 19(4): e0302619, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38640095

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0294556.].

15.
Stat Appl Genet Mol Biol ; 11(4)2012 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-22850067

RESUMO

An important and challenging problem in systems biology is the inference of gene regulatory networks from short non-stationary time series of transcriptional profiles. A popular approach that has been widely applied to this end is based on dynamic Bayesian networks (DBNs), although traditional homogeneous DBNs fail to model the non-stationarity and time-varying nature of the gene regulatory processes. Various authors have therefore recently proposed combining DBNs with multiple changepoint processes to obtain time varying dynamic Bayesian networks (TV-DBNs). However, TV-DBNs are not without problems. Gene expression time series are typically short, which leaves the model over-flexible, leading to over-fitting or inflated inference uncertainty. In the present paper, we introduce a Bayesian regularization scheme that addresses this difficulty. Our approach is based on the rationale that changes in gene regulatory processes appear gradually during an organism's life cycle or in response to a changing environment, and we have integrated this notion in the prior distribution of the TV-DBN parameters. We have extensively tested our regularized TV-DBN model on synthetic data, in which we have simulated short non-homogeneous time series produced from a system subject to gradual change. We have then applied our method to real-world gene expression time series, measured during the life cycle of Drosophila melanogaster, under artificially generated constant light condition in Arabidopsis thaliana, and from a synthetically designed strain of Saccharomyces cerevisiae exposed to a changing environment.


Assuntos
Redes Reguladoras de Genes , Biologia Sintética/estatística & dados numéricos , Biologia de Sistemas/estatística & dados numéricos , Algoritmos , Arabidopsis/genética , Teorema de Bayes , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Regulação Fúngica da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes/fisiologia , Heterogeneidade Genética , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos , Saccharomyces cerevisiae/genética , Biologia Sintética/métodos , Biologia de Sistemas/métodos
16.
PLoS One ; 18(11): e0294556, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38019869

RESUMO

BACKGROUND: Severe acute respiratory syndrome coronavirus-2 (SARS-COV-2) can affect anyone, however, it is often mixed with other respiratory diseases. This study aimed to identify the factors associated with SARS-COV-2 positive test. METHODS: Participants from the Northern Netherlands representative of the general population were included if filled in the questionnaire about well-being between June 2020-April 2021 and were tested for SARS-COV-2. The outcome was a self-reported test as measured by polymerase chain reaction. The data were collected on age, sex, household, smoking, alcohol use, physical activity, quality of life, fatigue, symptoms and medications use. Participants were matched on sex, age and the timing of their SARS-COV-2 tests maintaining a 1:4 ratio and classified into those with a positive and negative SARS-COV-2 using logistic regression. The performance of the model was compared with other machine-learning algorithms by the area under the receiving operating curve. RESULTS: 2564 (20%) of 12786 participants had a positive SARS-COV-2 test. The factors associated with a higher risk of SARS-COV-2 positive test in multivariate logistic regression were: contact with someone tested positive for SARS-COV-2, ≥1 household members, typical SARS-COV-2 symptoms, male gender and fatigue. The factors associated with a lower risk of SARS-COV-2 positive test were higher quality of life, inhaler use, runny nose, lower back pain, diarrhea, pain when breathing, sore throat, pain in neck, shoulder or arm, numbness or tingling, and stomach pain. The performance of the logistic models was comparable with that of random forest, support vector machine and gradient boosting machine. CONCLUSIONS: Having a contact with someone tested positive for SARS-COV-2 and living in a household with someone else are the most important factors related to a positive SARS-COV-2 test. The loss of smell or taste is the most prominent symptom associated with a positive test. Symptoms like runny nose, pain when breathing, sore throat are more likely to be indicative of other conditions.


Assuntos
COVID-19 , Faringite , Humanos , Masculino , SARS-CoV-2 , COVID-19/diagnóstico , Qualidade de Vida , Dor , Rinorreia
17.
J Appl Stat ; 50(10): 2171-2193, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37434627

RESUMO

We develop a generalized linear mixed model (GLMM) for bivariate count responses for statistically analyzing dragonfly population data from the Northern Netherlands. The populations of the threatened dragonfly species Aeshna viridis were counted in the years 2015-2018 at 17 different locations (ponds and ditches). Two different widely applied population size measures were used to quantify the population sizes, namely the number of found exoskeletons ('exuviae') and the number of spotted egg-laying females were counted. Since both measures (responses) led to many zero counts but also feature very large counts, our GLMM model builds on a zero-inflated bivariate geometric (ZIBGe) distribution, for which we show that it can be easily parameterized in terms of a correlation parameter and its two marginal medians. We model the medians with linear combinations of fixed (environmental covariates) and random (location-specific intercepts) effects. Modeling the medians yields a decreased sensitivity to overly large counts; in particular, in light of growing marginal zero inflation rates. Because of the relatively small sample size (n = 114) we follow a Bayesian modeling approach and use Metropolis-Hastings Markov Chain Monte Carlo (MCMC) simulations for generating posterior samples.

18.
Bioinformatics ; 27(5): 693-9, 2011 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-21177328

RESUMO

METHOD: Dynamic Bayesian networks (DBNs) have been applied widely to reconstruct the structure of regulatory processes from time series data, and they have established themselves as a standard modelling tool in computational systems biology. The conventional approach is based on the assumption of a homogeneous Markov chain, and many recent research efforts have focused on relaxing this restriction. An approach that enjoys particular popularity is based on a combination of a DBN with a multiple changepoint process, and the application of a Bayesian inference scheme via reversible jump Markov chain Monte Carlo (RJMCMC). In the present article, we expand this approach in two ways. First, we show that a dynamic programming scheme allows the changepoints to be sampled from the correct conditional distribution, which results in improved convergence over RJMCMC. Second, we introduce a novel Bayesian clustering and information sharing scheme among nodes, which provides a mechanism for automatic model complexity tuning. RESULTS: We evaluate the dynamic programming scheme on expression time series for Arabidopsis thaliana genes involved in circadian regulation. In a simulation study we demonstrate that the regularization scheme improves the network reconstruction accuracy over that obtained with recently proposed inhomogeneous DBNs. For gene expression profiles from a synthetically designed Saccharomyces cerevisiae strain under switching carbon metabolism we show that the combination of both: dynamic programming and regularization yields an inference procedure that outperforms two alternative established network reconstruction methods from the biology literature. AVAILABILITY AND IMPLEMENTATION: A MATLAB implementation of the algorithm and a supplementary paper with algorithmic details and further results for the Arabidopsis data can be downloaded from: http://www.statistik.tu-dortmund.de/bio2010.html.


Assuntos
Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Arabidopsis/genética , Teorema de Bayes , Análise por Conglomerados , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
19.
EBioMedicine ; 71: 103550, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34425309

RESUMO

BACKGROUND: The potential role of individual plasma biomarkers in the pathogenesis of type 2 diabetes (T2D) has been broadly studied, but the impact of biomarkers interaction remains underexplored. Recently, the Mahalanobis distance (MD) of plasma biomarkers has been proposed as a proxy of physiological dysregulation. Here we aimed to investigate whether the MD calculated from circulating biomarkers is prospectively associated with development of T2D. METHODS: We calculated the MD of the Principal Components (PCs) integrating the information of 32 circulating biomarkers (comprising inflammation, glycemic, lipid, microbiome and one-carbon metabolism) measured in 6247 participants of the PREVEND study without T2D at baseline. Cox proportional-hazards regression analyses were performed to study the association of MD with T2D development. FINDINGS: After a median follow-up of 7·3 years, 312 subjects developed T2D. The overall MD (mean (SD)) was higher in subjects who developed T2D compared to those who did not: 35·65 (26·67) and 30.75 (27·57), respectively (P = 0·002). The highest hazard ratio (HR) was obtained using the MD calculated from the first 31 PCs (per 1 log-unit increment) (1·72 (95% CI 1·42,2·07), P < 0·001). Such associations remained after the adjustment for age, sex, plasma glucose, parental history of T2D, lipids, blood pressure medication, and BMI (HRadj 1·37 (95% CI 1·11,1·70), P = 0·004). INTERPRETATION: Our results are in line with the premise that MD represents an estimate of homeostasis loss. This study suggests that MD is able to provide information about physiological dysregulation also in the pathogenesis of T2D. FUNDING: The Dutch Kidney Foundation (Grant E.033).


Assuntos
Envelhecimento/sangue , Diabetes Mellitus Tipo 2/sangue , Homeostase , Metaboloma , Adulto , Idoso , Biomarcadores/sangue , Interpretação Estatística de Dados , Diabetes Mellitus Tipo 2/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Análise de Componente Principal
20.
Bioinformatics ; 24(18): 2071-8, 2008 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-18664467

RESUMO

METHOD: The objective of the present article is to propose and evaluate a probabilistic approach based on Bayesian networks for modelling non-homogeneous and non-linear gene regulatory processes. The method is based on a mixture model, using latent variables to assign individual measurements to different classes. The practical inference follows the Bayesian paradigm and samples the network structure, the number of classes and the assignment of latent variables from the posterior distribution with Markov Chain Monte Carlo (MCMC), using the recently proposed allocation sampler as an alternative to RJMCMC. RESULTS: We have evaluated the method using three criteria: network reconstruction, statistical significance and biological plausibility. In terms of network reconstruction, we found improved results both for a synthetic network of known structure and for a small real regulatory network derived from the literature. We have assessed the statistical significance of the improvement on gene expression time series for two different systems (viral challenge of macrophages, and circadian rhythms in plants), where the proposed new scheme tends to outperform the classical BGe score. Regarding biological plausibility, we found that the inference results obtained with the proposed method were in excellent agreement with biological findings, predicting dichotomies that one would expect to find in the studied systems. AVAILABILITY: Two supplementary papers on theoretical (T) and experi-mental (E) aspects and the datasets used in our study are available from http://www.bioss.ac.uk/associates/marco/supplement/


Assuntos
Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Modelos Estatísticos , Algoritmos , Arabidopsis/genética , Arabidopsis/fisiologia , Teorema de Bayes , Ritmo Circadiano , Simulação por Computador , Macrófagos/citologia , Macrófagos/metabolismo , Proteoma/metabolismo
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa