Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
PLoS One ; 19(4): e0302619, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38640095

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0294556.].

2.
J Appl Stat ; 51(5): 845-865, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38524794

RESUMO

Statistical learning of the structures of cellular networks, such as protein signaling pathways, is a topical research field in computational systems biology. To get the most information out of experimental data, it is often required to develop a tailored statistical approach rather than applying one of the off-the-shelf network reconstruction methods. The focus of this paper is on learning the structure of the mTOR protein signaling pathway from immunoblotting protein phosphorylation data. Under two experimental conditions eleven phosphorylation sites of eight key proteins of the mTOR pathway were measured at ten non-equidistant time points. For the statistical analysis we propose a new advanced hierarchically coupled non-homogeneous dynamic Bayesian network (NH-DBN) model, and we consider various data imputation methods for dealing with non-equidistant temporal observations. Because of the absence of a true gold standard network, we propose to use predictive probabilities in combination with a leave-one-out cross validation strategy to objectively cross-compare the accuracies of different NH-DBN models and data imputation methods. Finally, we employ the best combination of model and data imputation method for predicting the structure of the mTOR protein signaling pathway.

3.
PLoS One ; 18(11): e0294556, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38019869

RESUMO

BACKGROUND: Severe acute respiratory syndrome coronavirus-2 (SARS-COV-2) can affect anyone, however, it is often mixed with other respiratory diseases. This study aimed to identify the factors associated with SARS-COV-2 positive test. METHODS: Participants from the Northern Netherlands representative of the general population were included if filled in the questionnaire about well-being between June 2020-April 2021 and were tested for SARS-COV-2. The outcome was a self-reported test as measured by polymerase chain reaction. The data were collected on age, sex, household, smoking, alcohol use, physical activity, quality of life, fatigue, symptoms and medications use. Participants were matched on sex, age and the timing of their SARS-COV-2 tests maintaining a 1:4 ratio and classified into those with a positive and negative SARS-COV-2 using logistic regression. The performance of the model was compared with other machine-learning algorithms by the area under the receiving operating curve. RESULTS: 2564 (20%) of 12786 participants had a positive SARS-COV-2 test. The factors associated with a higher risk of SARS-COV-2 positive test in multivariate logistic regression were: contact with someone tested positive for SARS-COV-2, ≥1 household members, typical SARS-COV-2 symptoms, male gender and fatigue. The factors associated with a lower risk of SARS-COV-2 positive test were higher quality of life, inhaler use, runny nose, lower back pain, diarrhea, pain when breathing, sore throat, pain in neck, shoulder or arm, numbness or tingling, and stomach pain. The performance of the logistic models was comparable with that of random forest, support vector machine and gradient boosting machine. CONCLUSIONS: Having a contact with someone tested positive for SARS-COV-2 and living in a household with someone else are the most important factors related to a positive SARS-COV-2 test. The loss of smell or taste is the most prominent symptom associated with a positive test. Symptoms like runny nose, pain when breathing, sore throat are more likely to be indicative of other conditions.


Assuntos
COVID-19 , Faringite , Humanos , Masculino , SARS-CoV-2 , COVID-19/diagnóstico , Qualidade de Vida , Dor , Rinorreia
4.
Bioinformatics ; 39(10)2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37774002

RESUMO

MOTIVATION: Investigating cell differentiation under a genetic disorder offers the potential for improving current gene therapy strategies. Clonal tracking provides a basis for mathematical modelling of population stem cell dynamics that sustain the blood cell formation, a process known as haematopoiesis. However, many clonal tracking protocols rely on a subset of cell types for the characterization of the stem cell output, and the data generated are subject to measurement errors and noise. RESULTS: We propose a stochastic framework to infer dynamic models of cell differentiation from clonal tracking data. A state-space formulation combines a stochastic quasi-reaction network, describing cell differentiation, with a Gaussian measurement model accounting for data errors and noise. We developed an inference algorithm based on an extended Kalman filter, a nonlinear optimization, and a Rauch-Tung-Striebel smoother. Simulations show that our proposed method outperforms the state-of-the-art and scales to complex structures of cell differentiations in terms of nodes size and network depth. The application of our method to five in vivo gene therapy studies reveals different dynamics of cell differentiation. Our tool can provide statistical support to biologists and clinicians to better understand cell differentiation and haematopoietic reconstitution after a gene therapy treatment. The equations of the state-space model can be modified to infer other dynamics besides cell differentiation. AVAILABILITY AND IMPLEMENTATION: The stochastic framework is implemented in the R package Karen which is available for download at https://cran.r-project.org/package=Karen. The code that supports the findings of this study is openly available at https://github.com/delcore-luca/CellDifferentiationNetworks.


Assuntos
Algoritmos , Modelos Teóricos , Diferenciação Celular , Hematopoese/genética , Redes Reguladoras de Genes
5.
J Appl Stat ; 50(10): 2171-2193, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37434627

RESUMO

We develop a generalized linear mixed model (GLMM) for bivariate count responses for statistically analyzing dragonfly population data from the Northern Netherlands. The populations of the threatened dragonfly species Aeshna viridis were counted in the years 2015-2018 at 17 different locations (ponds and ditches). Two different widely applied population size measures were used to quantify the population sizes, namely the number of found exoskeletons ('exuviae') and the number of spotted egg-laying females were counted. Since both measures (responses) led to many zero counts but also feature very large counts, our GLMM model builds on a zero-inflated bivariate geometric (ZIBGe) distribution, for which we show that it can be easily parameterized in terms of a correlation parameter and its two marginal medians. We model the medians with linear combinations of fixed (environmental covariates) and random (location-specific intercepts) effects. Modeling the medians yields a decreased sensitivity to overly large counts; in particular, in light of growing marginal zero inflation rates. Because of the relatively small sample size (n = 114) we follow a Bayesian modeling approach and use Metropolis-Hastings Markov Chain Monte Carlo (MCMC) simulations for generating posterior samples.

6.
BMC Bioinformatics ; 24(1): 228, 2023 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-37268887

RESUMO

BACKGROUND: Mathematical models of haematopoiesis can provide insights on abnormal cell expansions (clonal dominance), and in turn can guide safety monitoring in gene therapy clinical applications. Clonal tracking is a recent high-throughput technology that can be used to quantify cells arising from a single haematopoietic stem cell ancestor after a gene therapy treatment. Thus, clonal tracking data can be used to calibrate the stochastic differential equations describing clonal population dynamics and hierarchical relationships in vivo. RESULTS: In this work we propose a random-effects stochastic framework that allows to investigate the presence of events of clonal dominance from high-dimensional clonal tracking data. Our framework is based on the combination between stochastic reaction networks and mixed-effects generalized linear models. Starting from the Kramers-Moyal approximated Master equation, the dynamics of cells duplication, death and differentiation at clonal level, can be described by a local linear approximation. The parameters of this formulation, which are inferred using a maximum likelihood approach, are assumed to be shared across the clones and are not sufficient to describe situation in which clones exhibit heterogeneity in their fitness that can lead to clonal dominance. In order to overcome this limitation, we extend the base model by introducing random-effects for the clonal parameters. This extended formulation is calibrated to the clonal data using a tailor-made expectation-maximization algorithm. We also provide the companion  package RestoreNet, publicly available for download at https://cran.r-project.org/package=RestoreNet . CONCLUSIONS: Simulation studies show that our proposed method outperforms the state-of-the-art. The application of our method in two in-vivo studies unveils the dynamics of clonal dominance. Our tool can provide statistical support to biologists in gene therapy safety analyses.


Assuntos
Algoritmos , Modelos Teóricos , Funções Verossimilhança , Simulação por Computador , Células Clonais , Processos Estocásticos
7.
Bioinformatics ; 38(22): 5049-5054, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-36179082

RESUMO

MOTIVATION: Gaussian graphical models (GGMs) are network representations of random variables (as nodes) and their partial correlations (as edges). GGMs overcome the challenges of high-dimensional data analysis by using shrinkage methodologies. Therefore, they have become useful to reconstruct gene regulatory networks from gene-expression profiles. However, it is often ignored that the partial correlations are 'shrunk' and that they cannot be compared/assessed directly. Therefore, accurate (differential) network analyses need to account for the number of variables, the sample size, and also the shrinkage value, otherwise, the analysis and its biological interpretation would turn biased. To date, there are no appropriate methods to account for these factors and address these issues. RESULTS: We derive the statistical properties of the partial correlation obtained with the Ledoit-Wolf shrinkage. Our result provides a toolbox for (differential) network analyses as (i) confidence intervals, (ii) a test for zero partial correlation (null-effects) and (iii) a test to compare partial correlations. Our novel (parametric) methods account for the number of variables, the sample size and the shrinkage values. Additionally, they are computationally fast, simple to implement and require only basic statistical knowledge. Our simulations show that the novel tests perform better than DiffNetFDR-a recently published alternative-in terms of the trade-off between true and false positives. The methods are demonstrated on synthetic data and two gene-expression datasets from Escherichia coli and Mus musculus. AVAILABILITY AND IMPLEMENTATION: The R package with the methods and the R script with the analysis are available in https://github.com/V-Bernal/GeneNetTools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Redes Reguladoras de Genes , Camundongos , Animais , Distribuição Normal , Tamanho da Amostra , Expressão Gênica
8.
BMC Bioinformatics ; 22(1): 424, 2021 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-34493207

RESUMO

BACKGROUND: In systems biology, it is important to reconstruct regulatory networks from quantitative molecular profiles. Gaussian graphical models (GGMs) are one of the most popular methods to this end. A GGM consists of nodes (representing the transcripts, metabolites or proteins) inter-connected by edges (reflecting their partial correlations). Learning the edges from quantitative molecular profiles is statistically challenging, as there are usually fewer samples than nodes ('high dimensional problem'). Shrinkage methods address this issue by learning a regularized GGM. However, it remains open to study how the shrinkage affects the final result and its interpretation. RESULTS: We show that the shrinkage biases the partial correlation in a non-linear way. This bias does not only change the magnitudes of the partial correlations but also affects their order. Furthermore, it makes networks obtained from different experiments incomparable and hinders their biological interpretation. We propose a method, referred to as 'un-shrinking' the partial correlation, which corrects for this non-linear bias. Unlike traditional methods, which use a fixed shrinkage value, the new approach provides partial correlations that are closer to the actual (population) values and that are easier to interpret. This is demonstrated on two gene expression datasets from Escherichia coli and Mus musculus. CONCLUSIONS: GGMs are popular undirected graphical models based on partial correlations. The application of GGMs to reconstruct regulatory networks is commonly performed using shrinkage to overcome the 'high-dimensional problem'. Besides it advantages, we have identified that the shrinkage introduces a non-linear bias in the partial correlations. Ignoring this type of effects caused by the shrinkage can obscure the interpretation of the network, and impede the validation of earlier reported results.


Assuntos
Biologia de Sistemas , Animais , Camundongos , Distribuição Normal
9.
EBioMedicine ; 71: 103550, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34425309

RESUMO

BACKGROUND: The potential role of individual plasma biomarkers in the pathogenesis of type 2 diabetes (T2D) has been broadly studied, but the impact of biomarkers interaction remains underexplored. Recently, the Mahalanobis distance (MD) of plasma biomarkers has been proposed as a proxy of physiological dysregulation. Here we aimed to investigate whether the MD calculated from circulating biomarkers is prospectively associated with development of T2D. METHODS: We calculated the MD of the Principal Components (PCs) integrating the information of 32 circulating biomarkers (comprising inflammation, glycemic, lipid, microbiome and one-carbon metabolism) measured in 6247 participants of the PREVEND study without T2D at baseline. Cox proportional-hazards regression analyses were performed to study the association of MD with T2D development. FINDINGS: After a median follow-up of 7·3 years, 312 subjects developed T2D. The overall MD (mean (SD)) was higher in subjects who developed T2D compared to those who did not: 35·65 (26·67) and 30.75 (27·57), respectively (P = 0·002). The highest hazard ratio (HR) was obtained using the MD calculated from the first 31 PCs (per 1 log-unit increment) (1·72 (95% CI 1·42,2·07), P < 0·001). Such associations remained after the adjustment for age, sex, plasma glucose, parental history of T2D, lipids, blood pressure medication, and BMI (HRadj 1·37 (95% CI 1·11,1·70), P = 0·004). INTERPRETATION: Our results are in line with the premise that MD represents an estimate of homeostasis loss. This study suggests that MD is able to provide information about physiological dysregulation also in the pathogenesis of T2D. FUNDING: The Dutch Kidney Foundation (Grant E.033).


Assuntos
Envelhecimento/sangue , Diabetes Mellitus Tipo 2/sangue , Homeostase , Metaboloma , Adulto , Idoso , Biomarcadores/sangue , Interpretação Estatística de Dados , Diabetes Mellitus Tipo 2/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Análise de Componente Principal
10.
BMC Bioinformatics ; 22(Suppl 2): 196, 2021 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-33902443

RESUMO

BACKGROUND: Linear regression models are important tools for learning regulatory networks from gene expression time series. A conventional assumption for non-homogeneous regulatory processes on a short time scale is that the network structure stays constant across time, while the network parameters are time-dependent. The objective is then to learn the network structure along with changepoints that divide the time series into time segments. An uncoupled model learns the parameters separately for each segment, while a coupled model enforces the parameters of any segment to stay similar to those of the previous segment. In this paper, we propose a new consensus model that infers for each individual time segment whether it is coupled to (or uncoupled from) the previous segment. RESULTS: The results show that the new consensus model is superior to the uncoupled and the coupled model, as well as superior to a recently proposed generalized coupled model. CONCLUSIONS: The newly proposed model has the uncoupled and the coupled model as limiting cases, and it is able to infer the best trade-off between them from the data.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Teorema de Bayes , Modelos Lineares
11.
J Environ Manage ; 276: 111296, 2020 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-32906073

RESUMO

Drought is a complex natural hazard. It occurs due to a prolonged period of deficient in rainfall amount in a certain region. Unlike other natural hazards, drought hazard has a recurrent occurrence. Therefore, comprehensive drought monitoring is essential for regional climate control and water management authorities. In this paper, we have proposed a new drought indicator: the Seasonally Combinative Regional Drought Indicator (SCRDI). The SCRDI integrates Bayesian networking theory with Standardized Precipitation Temperature Index (SPTI) at varying gauge stations in various month/seasons. Application of SCRDI is based on five gauging stations of Northern Area of Pakistan. We have found that the proposed indicator accounts the effect of climate variation within a specified territory, accurately characterizes drought by capturing seasonal dependencies in geospatial variation scenario, and reduces the large/complex data for future drought monitoring. In summary, the proposed indicator can be used for comprehensive characterization and assessment of drought at a certain region.


Assuntos
Secas , Teorema de Bayes , Paquistão , Estações do Ano , Temperatura
12.
Bioinformatics ; 36(4): 1198-1207, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31504191

RESUMO

MOTIVATION: Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular tool for learning networks with time-varying interaction parameters. A multiple changepoint process is used to divide the data into disjoint segments and the network interaction parameters are assumed to be segment-specific. The objective is to infer the network structure along with the segmentation and the segment-specific parameters from the data. The conventional (uncoupled) NH-DBNs do not allow for information exchange among segments, and the interaction parameters have to be learned separately for each segment. More advanced coupled NH-DBN models allow the interaction parameters to vary but enforce them to stay similar over time. As the enforced similarity of the network parameters can have counter-productive effects, we propose a new consensus NH-DBN model that combines features of the uncoupled and the coupled NH-DBN. The new model infers for each individual edge whether its interaction parameter stays similar over time (and should be coupled) or if it changes from segment to segment (and should stay uncoupled). RESULTS: Our new model yields higher network reconstruction accuracies than state-of-the-art models for synthetic and yeast network data. For gene expression data from A.thaliana our new model infers a plausible network topology and yields hypotheses about the light-dependencies of the gene interactions. AVAILABILITY AND IMPLEMENTATION: Data are available from earlier publications. Matlab code is available at Bioinformatics online. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Teorema de Bayes , Perfilação da Expressão Gênica , Saccharomyces cerevisiae/genética
13.
JMIR Med Inform ; 7(4): e15358, 2019 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-31670697

RESUMO

BACKGROUND: Hemodynamic assessment of critically ill patients is a challenging endeavor, and advanced monitoring techniques are often required to guide treatment choices. Given the technical complexity and occasional unavailability of these techniques, estimation of cardiac function based on clinical examination is valuable for critical care physicians to diagnose circulatory shock. Yet, the lack of knowledge on how to best conduct and teach the clinical examination to estimate cardiac function has reduced its accuracy to almost that of "flipping a coin." OBJECTIVE: The aim of this study was to investigate the decision-making process underlying estimates of cardiac function of patients acutely admitted to the intensive care unit (ICU) based on current standardized clinical examination using Bayesian methods. METHODS: Patient data were collected as part of the Simple Intensive Care Studies-I (SICS-I) prospective cohort study. All adult patients consecutively admitted to the ICU with an expected stay longer than 24 hours were included, for whom clinical examination was conducted and cardiac function was estimated. Using these data, first, the probabilistic dependencies between the examiners' estimates and the set of clinically measured variables upon which these rely were analyzed using a Bayesian network. Second, the accuracy of cardiac function estimates was assessed by comparison to the cardiac index values measured by critical care ultrasonography. RESULTS: A total of 1075 patients were included, of which 783 patients had validated cardiac index measurements. A Bayesian network analysis identified two clinical variables upon which cardiac function estimate is conditionally dependent, namely, noradrenaline administration and presence of delayed capillary refill time or mottling. When the patient received noradrenaline, the probability of cardiac function being estimated as reasonable or good P(ER,G) was lower, irrespective of whether the patient was mechanically ventilated (P[ER,G|ventilation, noradrenaline]=0.63, P[ER,G|ventilation, no noradrenaline]=0.91, P[ER,G|no ventilation, noradrenaline]=0.67, P[ER,G|no ventilation, no noradrenaline]=0.93). The same trend was found for capillary refill time or mottling. Sensitivity of estimating a low cardiac index was 26% and 39% and specificity was 83% and 74% for students and physicians, respectively. Positive and negative likelihood ratios were 1.53 (95% CI 1.19-1.97) and 0.87 (95% CI 0.80-0.95), respectively, overall. CONCLUSIONS: The conditional dependencies between clinical variables and the cardiac function estimates resulted in a network consistent with known physiological relations. Conditional probability queries allow for multiple clinical scenarios to be recreated, which provide insight into the possible thought process underlying the examiners' cardiac function estimates. This information can help develop interactive digital training tools for students and physicians and contribute toward the goal of further improving the diagnostic accuracy of clinical examination in ICU patients. TRIAL REGISTRATION: ClinicalTrials.gov NCT02912624; https://clinicaltrials.gov/ct2/show/NCT02912624.

14.
Bioinformatics ; 35(23): 5011-5017, 2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31077287

RESUMO

MOTIVATION: One of the main goals in systems biology is to learn molecular regulatory networks from quantitative profile data. In particular, Gaussian graphical models (GGMs) are widely used network models in bioinformatics where variables (e.g. transcripts, metabolites or proteins) are represented by nodes, and pairs of nodes are connected with an edge according to their partial correlation. Reconstructing a GGM from data is a challenging task when the sample size is smaller than the number of variables. The main problem consists in finding the inverse of the covariance estimator which is ill-conditioned in this case. Shrinkage-based covariance estimators are a popular approach, producing an invertible 'shrunk' covariance. However, a proper significance test for the 'shrunk' partial correlation (i.e. the GGM edges) is an open challenge as a probability density including the shrinkage is unknown. In this article, we present (i) a geometric reformulation of the shrinkage-based GGM, and (ii) a probability density that naturally includes the shrinkage parameter. RESULTS: Our results show that the inference using this new 'shrunk' probability density is as accurate as Monte Carlo estimation (an unbiased non-parametric method) for any shrinkage value, while being computationally more efficient. We show on synthetic data how the novel test for significance allows an accurate control of the Type I error and outperforms the network reconstruction obtained by the widely used R package GeneNet. This is further highlighted in two gene expression datasets from stress response in Eschericha coli, and the effect of influenza infection in Mus musculus. AVAILABILITY AND IMPLEMENTATION: https://github.com/V-Bernal/GGM-Shrinkage. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Animais , Camundongos , Método de Monte Carlo , Distribuição Normal , Biologia de Sistemas
15.
Bioinformatics ; 35(12): 2108-2117, 2019 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-30395165

RESUMO

MOTIVATION: Non-homogeneous dynamic Bayesian networks (NH-DBNs) are a popular modelling tool for learning cellular networks from time series data. In systems biology, time series are often measured under different experimental conditions, and not rarely only some network interaction parameters depend on the condition while the other parameters stay constant across conditions. For this situation, we propose a new partially NH-DBN, based on Bayesian hierarchical regression models with partitioned design matrices. With regard to our main application to semi-quantitative (immunoblot) timecourse data from mammalian target of rapamycin complex 1 (mTORC1) signalling, we also propose a Gaussian process-based method to solve the problem of non-equidistant time series measurements. RESULTS: On synthetic network data and on yeast gene expression data the new model leads to improved network reconstruction accuracies. We then use the new model to reconstruct the topologies of the circadian clock network in Arabidopsis thaliana and the mTORC1 signalling pathway. The inferred network topologies show features that are consistent with the biological literature. AVAILABILITY AND IMPLEMENTATION: All datasets have been made available with earlier publications. Our Matlab code is available upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Simulação por Computador , Algoritmos , Teorema de Bayes , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Distribuição Normal
16.
Methods Mol Biol ; 1883: 49-94, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30547396

RESUMO

A challenging problem in systems biology is the reconstruction of gene regulatory networks from postgenomic data. A variety of reverse engineering methods from machine learning and computational statistics have been proposed in the literature. However, deciding on the best method to adopt for a particular application or data set might be a confusing task. The present chapter provides a broad overview of state-of-the-art methods with an emphasis on conceptual understanding rather than a deluge of mathematical details, and the pros and cons of the various approaches are discussed. Guidance on practical applications with pointers to publicly available software implementations are included. The chapter concludes with a comprehensive comparative benchmark study on simulated data and a real-work application taken from the current plant systems biology.


Assuntos
Ciência de Dados/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Biologia de Sistemas/métodos , Algoritmos , Arabidopsis/genética , Teorema de Bayes , Ciência de Dados/instrumentação , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/métodos , Distribuição Normal , Software , Biologia de Sistemas/instrumentação
17.
Comput Stat ; 32(2): 717-761, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-32103862

RESUMO

Thermodynamic integration (TI) for computing marginal likelihoods is based on an inverse annealing path from the prior to the posterior distribution. In many cases, the resulting estimator suffers from high variability, which particularly stems from the prior regime. When comparing complex models with differences in a comparatively small number of parameters, intrinsic errors from sampling fluctuations may outweigh the differences in the log marginal likelihood estimates. In the present article, we propose a TI scheme that directly targets the log Bayes factor. The method is based on a modified annealing path between the posterior distributions of the two models compared, which systematically avoids the high variance prior regime. We combine this scheme with the concept of non-equilibrium TI to minimise discretisation errors from numerical integration. Results obtained on Bayesian regression models applied to standard benchmark data, and a complex hierarchical model applied to biopathway inference, demonstrate a significant reduction in estimator variance over state-of-the-art TI methods.

18.
Stat Comput ; 27(4): 1003-1040, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-32226236

RESUMO

Inference of interaction networks represented by systems of differential equations is a challenging problem in many scientific disciplines. In the present article, we follow a semi-mechanistic modelling approach based on gradient matching. We investigate the extent to which key factors, including the kinetic model, statistical formulation and numerical methods, impact upon performance at network reconstruction. We emphasize general lessons for computational statisticians when faced with the challenge of model selection, and we assess the accuracy of various alternative paradigms, including recent widely applicable information criteria and different numerical procedures for approximating Bayes factors. We conduct the comparative evaluation with a novel inferential pipeline that systematically disambiguates confounding factors via an ANOVA scheme.

19.
Stat Appl Genet Mol Biol ; 14(2): 143-67, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25719342

RESUMO

There has been much interest in reconstructing bi-directional regulatory networks linking the circadian clock to metabolism in plants. A variety of reverse engineering methods from machine learning and computational statistics have been proposed and evaluated. The emphasis of the present paper is on combining models in a model ensemble to boost the network reconstruction accuracy, and to explore various model combination strategies to maximize the improvement. Our results demonstrate that a rich ensemble of predictors outperforms the best individual model, even if the ensemble includes poor predictors with inferior individual reconstruction accuracy. For our application to metabolomic and transcriptomic time series from various mutagenesis plants grown in different light-dark cycles we also show how to determine the optimal time lag between interactions, and we identify significant interactions with a randomization test. Our study predicts new statistically significant interactions between circadian clock genes and metabolites in Arabidopsis thaliana, and thus provides independent statistical evidence that the regulation of metabolism by the circadian clock is not uni-directional, but that there is a statistically significant feedback mechanism aiming from metabolism back to the circadian clock.


Assuntos
Relógios Circadianos/genética , Metaboloma/genética , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Ritmo Circadiano/genética , Regulação da Expressão Gênica de Plantas/genética , Genes de Plantas/genética , Luz , Modelos Genéticos
20.
Stat Appl Genet Mol Biol ; 13(3): 227-73, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24864301

RESUMO

We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana.


Assuntos
Arabidopsis/genética , Arabidopsis/fisiologia , Ritmo Circadiano/genética , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Estatística como Assunto , Arabidopsis/efeitos da radiação , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Área Sob a Curva , Teorema de Bayes , Regulação da Expressão Gênica de Plantas/efeitos da radiação , Luz , Modelos Genéticos , Curva ROC , Análise de Regressão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA