Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
1.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38741151

ABSTRACT

MOTIVATION: Systems biology aims to better understand living systems through mathematical modelling of experimental and clinical data. A pervasive challenge in quantitative dynamical modelling is the integration of time series measurements, which often have high variability and low sampling resolution. Approaches are required to utilize such information while consistently handling uncertainties. RESULTS: We present BayModTS (Bayesian modelling of time series data), a new FAIR (findable, accessible, interoperable, and reusable) workflow for processing and analysing sparse and highly variable time series data. BayModTS consistently transfers uncertainties from data to model predictions, including process knowledge via parameterized models. Further, credible differences in the dynamics of different conditions can be identified by filtering noise. To demonstrate the power and versatility of BayModTS, we applied it to three hepatic datasets gathered from three different species and with different measurement techniques: (i) blood perfusion measurements by magnetic resonance imaging in rat livers after portal vein ligation, (ii) pharmacokinetic time series of different drugs in normal and steatotic mice, and (iii) CT-based volumetric assessment of human liver remnants after clinical liver resection. AVAILABILITY AND IMPLEMENTATION: The BayModTS codebase is available on GitHub at https://github.com/Systems-Theory-in-Systems-Biology/BayModTS. The repository contains a Python script for the executable BayModTS workflow and a widely applicable SBML (systems biology markup language) model for retarded transient functions. In addition, all examples from the paper are included in the repository. Data and code of the application examples are stored on DaRUS: https://doi.org/10.18419/darus-3876. The raw MRI ROI voxel data were uploaded to DaRUS: https://doi.org/10.18419/darus-3878. The steatosis metabolite data are published on FairdomHub: 10.15490/fairdomhub.1.study.1070.1.


Subject(s)
Bayes Theorem , Workflow , Animals , Rats , Humans , Mice , Systems Biology/methods , Liver/metabolism , Software , Magnetic Resonance Imaging/methods
2.
Nucleic Acids Res ; 51(13): 6622-6633, 2023 07 21.
Article in English | MEDLINE | ID: mdl-37246710

ABSTRACT

The specificity of DNMT1 for hemimethylated DNA is a central feature for the inheritance of DNA methylation. We investigated this property in competitive methylation kinetics using hemimethylated (HM), hemihydroxymethylated (OH) and unmethylated (UM) substrates with single CpG sites in a randomized sequence context. DNMT1 shows a strong flanking sequence dependent HM/UM specificity of 80-fold on average, which is slightly enhanced on long hemimethylated DNA substrates. To explain this strong effect of a single methyl group, we propose a novel model in which the presence of the 5mC methyl group changes the conformation of the DNMT1-DNA complex into an active conformation by steric repulsion. The HM/OH preference is flanking sequence dependent and on average only 13-fold, indicating that passive DNA demethylation by 5hmC generation is not efficient in many flanking contexts. The CXXC domain of DNMT1 has a moderate flanking sequence dependent contribution to HM/UM specificity during DNA association to DNMT1, but not if DNMT1 methylates long DNA molecules in processive methylation mode. Comparison of genomic methylation patterns from mouse ES cell lines with various deletions of DNMTs and TETs with our data revealed that the UM specificity profile is most related to cellular methylation patterns, indicating that de novo methylation activity of DNMT1 shapes the DNA methylome in these cells.


Subject(s)
DNA (Cytosine-5-)-Methyltransferases , DNA , Animals , Mice , DNA (Cytosine-5-)-Methyltransferases/metabolism , DNA (Cytosine-5-)-Methyltransferase 1/genetics , DNA (Cytosine-5-)-Methyltransferase 1/metabolism , DNA/chemistry , DNA Methylation , DNA Modification Methylases/genetics , Epigenesis, Genetic
3.
Bioinformatics ; 39(39 Suppl 1): i440-i447, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387158

ABSTRACT

MOTIVATION: The Chemical Master Equation (CME) is a set of linear differential equations that describes the evolution of the probability distribution on all possible configurations of a (bio-)chemical reaction system. Since the number of configurations and therefore the dimension of the CME rapidly increases with the number of molecules, its applicability is restricted to small systems. A widely applied remedy for this challenge is moment-based approaches which consider the evolution of the first few moments of the distribution as summary statistics for the complete distribution. Here, we investigate the performance of two moment-estimation methods for reaction systems whose equilibrium distributions encounter fat-tailedness and do not possess statistical moments. RESULTS: We show that estimation via stochastic simulation algorithm (SSA) trajectories lose consistency over time and estimated moment values span a wide range of values even for large sample sizes. In comparison, the method of moments returns smooth moment estimates but is not able to indicate the non-existence of the allegedly predicted moments. We furthermore analyze the negative effect of a CME solution's fat-tailedness on SSA run times and explain inherent difficulties. While moment-estimation techniques are a commonly applied tool in the simulation of (bio-)chemical reaction networks, we conclude that they should be used with care, as neither the system definition nor the moment-estimation techniques themselves reliably indicate the potential fat-tailedness of the CME's solution.


Subject(s)
Algorithms , Computer Simulation , Probability , Sample Size
4.
Bioinformatics ; 38(18): 4352-4359, 2022 09 15.
Article in English | MEDLINE | ID: mdl-35916726

ABSTRACT

MOTIVATION: The Chemical Master Equation is a stochastic approach to describe the evolution of a (bio)chemical reaction system. Its solution is a time-dependent probability distribution on all possible configurations of the system. As this number is typically large, the Master Equation is often practically unsolvable. The Method of Moments reduces the system to the evolution of a few moments, which are described by ordinary differential equations. Those equations are not closed, since lower order moments generally depend on higher order moments. Various closure schemes have been suggested to solve this problem. Two major problems with these approaches are first that they are open loop systems, which can diverge from the true solution, and second, some of them are computationally expensive. RESULTS: Here we introduce Quasi-Entropy Closure, a moment-closure scheme for the Method of Moments. It estimates higher order moments by reconstructing the distribution that minimizes the distance to a uniform distribution subject to lower order moment constraints. Quasi-Entropy Closure can be regarded as an advancement of Zero-Information Closure, which similarly maximizes the information entropy. Results show that both approaches outperform truncation schemes. Quasi-Entropy Closure is computationally much faster than Zero-Information Closure, although both methods consider solutions on the space of configurations and hence do not completely overcome the curse of dimensionality. In addition, our scheme includes a plausibility check for the existence of a distribution satisfying a given set of moments on the feasible set of configurations. All results are evaluated on different benchmark problems. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Models, Biological , Stochastic Processes , Entropy , Probability , Statistical Distributions
5.
Stat Appl Genet Mol Biol ; 18(4)2019 07 26.
Article in English | MEDLINE | ID: mdl-31348764

ABSTRACT

Finite mixture models are widely used in the life sciences for data analysis. Yet, the calibration of these models to data is still challenging as the optimization problems are often ill-posed. This holds for censored and uncensored data, and is caused by symmetries and other types of non-identifiabilities. Here, we discuss the problem of parameter estimation and model selection for finite mixture models from a theoretical perspective. We provide a review of the existing literature and illustrate the ill-posedness of the calibration problem for mixtures of uniform distributions and mixtures of normal distributions. Furthermore, we assess the effect of interval censoring on this estimation problem. Interestingly, we find that a proper treatment of censoring can facilitate the estimation of the number of mixture components compared to inference from uncensored data, which is an at first glance surprising result. The aim of the manuscript is to raise awareness of challenges in the calibration of finite mixture models and to provide an overview about available techniques.


Subject(s)
Models, Statistical , Data Interpretation, Statistical , Likelihood Functions , Normal Distribution
6.
J Biol Chem ; 293(37): 14407-14416, 2018 09 14.
Article in English | MEDLINE | ID: mdl-30045871

ABSTRACT

Many newly synthesized cellular proteins pass through the Golgi complex from where secretory transport carriers sort them to the plasma membrane and the extracellular environment. The formation of these secretory carriers at the trans-Golgi network is promoted by the protein kinase D (PKD) family of serine/threonine kinases. Here, using mathematical modeling and experimental validation of the PKD activation and substrate phosphorylation kinetics, we reveal that the expression level of the PKD substrate deleted in liver cancer 1 (DLC1), a Rho GTPase-activating protein that is inhibited by PKD-mediated phosphorylation, determines PKD activity at the Golgi membranes. RNAi-mediated depletion of DLC1 reduced PKD activity in a Rho-Rho-associated protein kinase (ROCK)-dependent manner, impaired the exocytosis of the cargo protein horseradish peroxidase, and was associated with the accumulation of the small GTPase RAB6 on Golgi membranes, indicating a protein-trafficking defect. In summary, our findings reveal that DLC1 maintains basal activation of PKD at the Golgi and Golgi secretory activity, in part by down-regulating Rho-ROCK signaling. We propose that PKD senses cytoskeletal changes downstream of DLC1 to coordinate Rho signaling with Golgi secretory function.


Subject(s)
GTPase-Activating Proteins/metabolism , Protein Kinase C/metabolism , Tumor Suppressor Proteins/metabolism , trans-Golgi Network/metabolism , Cell Line, Tumor , Enzyme Activation , Exocytosis , GTPase-Activating Proteins/genetics , HEK293 Cells , Humans , Intracellular Membranes/metabolism , Models, Biological , Phosphorylation , RNA Interference , Signal Transduction , Substrate Specificity , Tumor Suppressor Proteins/genetics , rab GTP-Binding Proteins/metabolism , rho-Associated Kinases/metabolism
7.
J Theor Biol ; 455: 86-96, 2018 10 14.
Article in English | MEDLINE | ID: mdl-30017944

ABSTRACT

The relation between design principles of signaling network motifs and their robustness against intrinsic noise still remains illusive. In this work we investigate the role of cascading for coping with intrinsic noise due to stochasticity in molecular reactions. We use stochastic approaches to quantify fluctuations in the terminal kinase of phosphorylation-dephosphorylation cascade motifs and demonstrate that cascading highly affects these fluctuations. We show that this purely stochastic effect can be explained by time-varying sequestration of upstream kinase molecules. In particular, we discuss conditions on time scales and parameter regimes which lead to a reduction of output fluctuations. Our results are put into biological context by adapting rate parameters of our modeling approach to biologically feasible ranges for general binding-unbinding and phosphorylation-dephosphorylation mechanisms. Overall, this study reveals a novel role of stochastic sequestration for dynamic noise filtering in signaling cascade motifs.


Subject(s)
Computer Simulation , Models, Biological , Phosphotransferases/metabolism , Signal Transduction , Animals , Humans , Phosphorylation , Stochastic Processes
8.
Bioinformatics ; 32(16): 2464-72, 2016 08 15.
Article in English | MEDLINE | ID: mdl-27153627

ABSTRACT

MOTIVATION: The statistical analysis of single-cell data is a challenge in cell biological studies. Tailored statistical models and computational methods are required to resolve the subpopulation structure, i.e. to correctly identify and characterize subpopulations. These approaches also support the unraveling of sources of cell-to-cell variability. Finite mixture models have shown promise, but the available approaches are ill suited to the simultaneous consideration of data from multiple experimental conditions and to censored data. The prevalence and relevance of single-cell data and the lack of suitable computational analytics make automated methods, that are able to deal with the requirements posed by these data, necessary. RESULTS: We present MEMO, a flexible mixture modeling framework that enables the simultaneous, automated analysis of censored and uncensored data acquired under multiple experimental conditions. MEMO is based on maximum-likelihood inference and allows for testing competing hypotheses. MEMO can be applied to a variety of different single-cell data types. We demonstrate the advantages of MEMO by analyzing right and interval censored single-cell microscopy data. Our results show that an examination of censoring and the simultaneous consideration of different experimental conditions are necessary to reveal biologically meaningful subpopulation structures. MEMO allows for a stringent analysis of single-cell data and enables researchers to avoid misinterpretation of censored data. Therefore, MEMO is a valuable asset for all fields that infer the characteristics of populations by looking at single individuals such as cell biology and medicine. AVAILABILITY AND IMPLEMENTATION: MEMO is implemented in MATLAB and freely available via github (https://github.com/MEMO-toolbox/MEMO). CONTACTS: eva-maria.geissen@ist.uni-stuttgart.de or nicole.radde@ist.uni-stuttgart.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , Models, Statistical , Humans , Probability
9.
Bioinformatics ; 30(20): 2991-2, 2014 Oct 15.
Article in English | MEDLINE | ID: mdl-25005749

ABSTRACT

SUMMARY: We present a new C implementation of an advanced Markov chain Monte Carlo (MCMC) method for the sampling of ordinary differential equation (ode) model parameters. The software mcmc_clib uses the simplified manifold Metropolis-adjusted Langevin algorithm (SMMALA), which is locally adaptive; it uses the parameter manifold's geometry (the Fisher information) to make efficient moves. This adaptation does not diminish with MC length, which is highly advantageous compared with adaptive Metropolis techniques when the parameters have large correlations and/or posteriors substantially differ from multivariate Gaussians. The software is standalone (not a toolbox), though dependencies include the GNU scientific library and sundials libraries for ode integration and sensitivity analysis. AVAILABILITY AND IMPLEMENTATION: The source code and binary files are freely available for download at http://a-kramer.github.io/mcmc_clib/. This also includes example files and data. A detailed documentation, an example model and user manual are provided with the software. CONTACT: andrei.kramer@ist.uni-stuttgart.de.


Subject(s)
Algorithms , Markov Chains , Models, Statistical , Monte Carlo Method , Software
10.
BMC Bioinformatics ; 15: 253, 2014 Jul 28.
Article in English | MEDLINE | ID: mdl-25066046

ABSTRACT

BACKGROUND: Parameter estimation for differential equation models of intracellular processes is a highly relevant bu challenging task. The available experimental data do not usually contain enough information to identify all parameters uniquely, resulting in ill-posed estimation problems with often highly correlated parameters. Sampling-based Bayesian statistical approaches are appropriate for tackling this problem. The samples are typically generated via Markov chain Monte Carlo, however such methods are computationally expensive and their convergence may be slow, especially if there are strong correlations between parameters. Monte Carlo methods based on Euclidean or Riemannian Hamiltonian dynamics have been shown to outperform other samplers by making proposal moves that take the local sensitivities of the system's states into account and accepting these moves with high probability. However, the high computational cost involved with calculating the Hamiltonian trajectories prevents their widespread use for all but the smallest differential equation models. The further development of efficient sampling algorithms is therefore an important step towards improving the statistical analysis of predictive models of intracellular processes. RESULTS: We show how state of the art Hamiltonian Monte Carlo methods may be significantly improved for steady state dynamical models. We present a novel approach for efficiently calculating the required geometric quantities by tracking steady states across the Hamiltonian trajectories using a Newton-Raphson method and employing local sensitivity information. Using our approach, we compare both Euclidean and Riemannian versions of Hamiltonian Monte Carlo on three models for intracellular processes with real data and demonstrate at least an order of magnitude improvement in the effective sampling speed. We further demonstrate the wider applicability of our approach to other gradient based MCMC methods, such as those based on Langevin diffusions. CONCLUSION: Our approach is strictly benefitial in all test cases. The Matlab sources implementing our MCMC methodology is available from https://github.com/a-kramer/ode_rmhmc.


Subject(s)
Algorithms , Models, Biological , Monte Carlo Method , Systems Biology/methods , Bayes Theorem , Humans , Insulin/pharmacology , MAP Kinase Signaling System , Markov Chains , Phosphorylation/drug effects , Receptor, Insulin/metabolism
11.
BMC Bioinformatics ; 14 Suppl 19: S2, 2013.
Article in English | MEDLINE | ID: mdl-24564335

ABSTRACT

BACKGROUND: Mathematical models are nowadays widely used to describe biochemical reaction networks. One of the main reasons for this is that models facilitate the integration of a multitude of different data and data types using parameter estimation. Thereby, models allow for a holistic understanding of biological processes. However, due to measurement noise and the limited amount of data, uncertainties in the model parameters should be considered when conclusions are drawn from estimated model attributes, such as reaction fluxes or transient dynamics of biological species. METHODS AND RESULTS: We developed the visual analytics system iVUN that supports uncertainty-aware analysis of static and dynamic attributes of biochemical reaction networks modeled by ordinary differential equations. The multivariate graph of the network is visualized as a node-link diagram, and statistics of the attributes are mapped to the color of nodes and links of the graph. In addition, the graph view is linked with several views, such as line plots, scatter plots, and correlation matrices, to support locating uncertainties and the analysis of their time dependencies. As demonstration, we use iVUN to quantitatively analyze the dynamics of a model for Epo-induced JAK2/STAT5 signaling. CONCLUSION: Our case study showed that iVUN can be used to perform an in-depth study of biochemical reaction networks, including attribute uncertainties, correlations between these attributes and their uncertainties as well as the attribute dynamics. In particular, the linking of different visualization options turned out to be highly beneficial for the complex analysis tasks that come with the biological systems as presented here.


Subject(s)
Models, Biological , Models, Chemical , Uncertainty , Computational Biology/methods , Computer Graphics , Metabolic Networks and Pathways , Signal Transduction
12.
Bioinformatics ; 28(18): i535-i541, 2012 Sep 15.
Article in English | MEDLINE | ID: mdl-22962478

ABSTRACT

MOTIVATION: Experiment design strategies for biomedical models with the purpose of parameter estimation or model discrimination are in the focus of intense research. Experimental limitations such as sparse and noisy data result in unidentifiable parameters and render-related design tasks challenging problems. Often, the temporal resolution of data is a limiting factor and the amount of possible experimental interventions is finite. To address this issue, we propose a Bayesian experiment design algorithm to minimize the prediction uncertainty for a given set of experiments and compare it to traditional A-optimal design. RESULTS: In an in depth numerical study involving an ordinary differential equation model of the trans-Golgi network with 12 partly non-identifiable parameters, we minimized the prediction uncertainty efficiently for predefined scenarios. The introduced method results in twice the prediction precision as the same amount of A-optimal designed experiments while introducing a useful stopping criterion. The simulation intensity of the algorithm's major design step is thereby reasonably affordable. Besides smaller variances in the predicted trajectories compared with Fisher design, we could also achieve smaller parameter posterior distribution entropies, rendering this method superior to A-optimal Fisher design also in the parameter space. AVAILABILITY: Necessary software/toolbox information are available in the supplementary material. The project script including example data can be downloaded from http://www.ist.uni-stuttgart.de/%7eweber/BayesFisher2012. CONTACT: patrick.weber@ist.uni-stuttgart.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Models, Biological , Bayes Theorem , Research Design , Secretory Pathway , Uncertainty , trans-Golgi Network/metabolism
13.
J Theor Biol ; 337: 174-80, 2013 Nov 21.
Article in English | MEDLINE | ID: mdl-24001971

ABSTRACT

Here we present a minimal mathematical model for the sphingomyelin synthase 1 (SMS1) driven conversion of ceramide to sphingomyelin based on chemical reaction kinetics. We demonstrate via mathematical analysis that this model is not able to qualitatively reproduce experimental measurements on lipid compositions after altering SMS1 activity. We prove that a positive feedback mechanism from the products to the reactants of the reaction is one possible model extension to explain these specific experimental data. The proposed mechanism in fact exists in vivo via protein kinase D and the ceramide transfer protein CERT. The model is further evaluated by additional observations from the literature.


Subject(s)
Feedback, Physiological , Golgi Apparatus/metabolism , Models, Biological , Transferases (Other Substituted Phosphate Groups)/metabolism , Animals , CHO Cells , Cricetinae , Cricetulus , Diglycerides/metabolism , Gene Knockdown Techniques , HeLa Cells , Humans , Protein Kinase C/metabolism , Protein Transport
14.
IET Syst Biol ; 17(1): 1-13, 2023 02.
Article in English | MEDLINE | ID: mdl-36440585

ABSTRACT

Sparse and noisy measurements make parameter estimation for biochemical reaction networks difficult and might lead to ill-posed optimisation problems. This is potentiated if the data has to be normalised, and only fold changes rather than absolute amounts are available. Here, the authors consider the propagation of measurement noise to the distribution of the maximum likelihood (ML) estimator in an in silico study. Therefore, a model of a reversible reaction is considered, for which reaction rate constants using fold changes is estimated. Noise propagation is analysed for different normalisation strategies and different error models. In particular, accuracy, precision, and asymptotic properties of the ML estimator is investigated. Results show that normalisation by the mean of a time series outperforms normalisation by a single time point in the example provided by the authors. Moreover, the error model with a heavy-tail distribution is slightly more robust to large measurement noise, but, beyond this, the choice of the error model did not have a significant impact on the estimation results provided by the authors.


Subject(s)
Biochemical Phenomena , Likelihood Functions , Time Factors
15.
Sci Rep ; 13(1): 2695, 2023 02 15.
Article in English | MEDLINE | ID: mdl-36792648

ABSTRACT

The Systems Biology community has taken numerous actions to develop data and modeling standards towards FAIR data and model handling. Nevertheless, the debate about incentives and rewards for individual researchers to make their results reproducible is ongoing. Here, we pose the specific question of whether reproducible models have a higher impact in terms of citations. Therefore, we statistically analyze 328 published models recently classified by Tiwari et al. based on their reproducibility. For hypothesis testing, we use a flexible Bayesian approach that provides complete distributional information for all quantities of interest and can handle outliers. The results show that in the period from 2013, i.e., 10 years after the introduction of SBML, to 2020, the group of reproducible models is significantly more cited than the non-reproducible group. We show that differences in journal impact factors do not explain this effect and that this effect increases with additional standardization of data and error model integration via PEtab. Overall, our statistical analysis demonstrates the long-term merits of reproducible modeling for the individual researcher in terms of citations. Moreover, it provides evidence for the increased use of reproducible models in the scientific community.


Subject(s)
Journal Impact Factor , Systems Biology , Bayes Theorem , Reproducibility of Results , Publications
16.
J Chem Theory Comput ; 19(24): 9049-9059, 2023 Dec 26.
Article in English | MEDLINE | ID: mdl-38051675

ABSTRACT

In this article, we introduce a novel moment closure scheme based on concepts from model predictive control (MPC) to accurately describe the time evolution of the statistical moments of the solution of the chemical master equation (CME). The method of moments, a set of ordinary differential equations frequently used to calculate the first nm moments, is generally not closed since lower-order moments depend on higher-order moments. To overcome this limitation, we interpret the moment equations as a nonlinear dynamical system, where the first nm moments serve as states, and the closing moments serve as the control input. We demonstrate the efficacy of our approach using three example systems and show that it outperforms existing closure schemes. For polynomial systems, which encompass all mass-action systems, we provide probability bounds for the error between true and estimated moment trajectories. We achieve this by combining the convergence properties of a priori moment estimates from stochastic simulations with guarantees for nonlinear reference tracking MPC. Our proposed method offers an effective solution to accurately predict the time evolution of moments of the CME, which has wide-ranging implications for many fields, including biology, chemistry, and engineering.

17.
FEBS J ; 290(8): 2115-2126, 2023 04.
Article in English | MEDLINE | ID: mdl-36416580

ABSTRACT

In previous work, we have developed a DNA methylation-based epigenetic memory system that operates in Escherichia coli to detect environmental signals, trigger a phenotypic switch of the cells and store the information in DNA methylation. The system is based on the CcrM DNA methyltransferase and a synthetic zinc finger (ZnF4), which binds DNA in a CcrM methylation-dependent manner and functions as a repressor for a ccrM gene expressed together with an egfp reporter gene. Here, we developed a reversible reset for this memory system by adding an increased concentration of ZnSO4 to the bacterial cultivation medium and demonstrate that one bacterial culture could be reversibly switched ON and OFF in several cycles. We show that a previously developed differential equation model of the memory system can also describe the new data. Then, we studied the long-term stability of the ON-state of the system over approximately 100 cell divisions showing a gradual loss of ON-state signal starting after 4 days of cultivation that is caused by individual cells switching from an ON- into the OFF-state. Over time, the methylation of the ZnF4-binding sites is not fully maintained leading to an increased OFF switching probability of cells, because stronger binding of ZnF4 to partially demethylated operator sites leads to further reductions in the cellular concentrations of CcrM. These data will support future design to further stabilize the ON-state and enforce the binary switching behaviour of the system. Together with the development of a reversible OFF switch, our new findings strongly increase the capabilities of bacterial epigenetic biosensors.


Subject(s)
Epigenetic Memory , Gene Expression Regulation, Bacterial , Site-Specific DNA-Methyltransferase (Adenine-Specific)/genetics , Site-Specific DNA-Methyltransferase (Adenine-Specific)/metabolism , Bacteria/metabolism , DNA Methylation , DNA/metabolism
18.
Stat Methods Med Res ; 31(5): 947-958, 2022 05.
Article in English | MEDLINE | ID: mdl-35072570

ABSTRACT

The extraction of novel information from omics data is a challenging task, in particular, since the number of features (e.g. genes) often far exceeds the number of samples. In such a setting, conventional parameter estimation leads to ill-posed optimization problems, and regularization may be required. In addition, outliers can largely impact classification accuracy.Here we introduce ROSIE, an ensemble classification approach, which combines three sparse and robust classification methods for outlier detection and feature selection and further performs a bootstrap-based validity check. Outliers of ROSIE are determined by the rank product test using outlier rankings of all three methods, and important features are selected as features commonly selected by all methods.We apply ROSIE to RNA-Seq data from The Cancer Genome Atlas (TCGA) to classify observations into Triple-Negative Breast Cancer (TNBC) and non-TNBC tissue samples. The pre-processed dataset consists of 16,600 genes and more than 1,000 samples. We demonstrate that ROSIE selects important features and outliers in a robust way. Identified outliers are concordant with the distribution of the commonly selected genes by the three methods, and results are in line with other independent studies. Furthermore, we discuss the association of some of the selected genes with the TNBC subtype in other investigations. In summary, ROSIE constitutes a robust and sparse procedure to identify outliers and important genes through binary classification. Our approach is ad hoc applicable to other datasets, fulfilling the overall goal of simultaneously identifying outliers and candidate disease biomarkers to the targeted in therapy research and personalized medicine frameworks.


Subject(s)
Triple Negative Breast Neoplasms , Humans , Triple Negative Breast Neoplasms/genetics
19.
ACS Synth Biol ; 11(7): 2445-2455, 2022 07 15.
Article in English | MEDLINE | ID: mdl-35749318

ABSTRACT

Oscillations are an important component in biological systems; grasping their mechanisms and regulation, however, is difficult. Here, we use the theory of dynamical systems to support the design of oscillatory systems based on epigenetic control elements. Specifically, we use results that extend the Poincaré-Bendixson theorem for monotone control systems that are coupled to a negative feedback circuit. The methodology is applied to a synthetic epigenetic memory system based on DNA methylation that serves as a monotone control system, which is coupled to a negative feedback. This system is generally able to show sustained oscillations according to its structure; however, a first experimental implementation showed that fine-tuning of several parameters is required. We provide design support by exploring the experimental design space using systems-theoretic analysis of a computational model.


Subject(s)
Feedback, Physiological , Protein Processing, Post-Translational , Epigenesis, Genetic/genetics , Feedback , Methylation , Models, Biological
20.
Commun Biol ; 5(1): 92, 2022 01 24.
Article in English | MEDLINE | ID: mdl-35075236

ABSTRACT

TET dioxygenases convert 5-methylcytosine (5mC) preferentially in a CpG context into 5-hydroxymethylcytosine (5hmC) and higher oxidized forms, thereby initiating DNA demethylation, but details regarding the effects of the DNA sequences flanking the target 5mC site on TET activity are unknown. We investigated oxidation of libraries of DNA substrates containing one 5mC or 5hmC residue in randomized sequence context using single molecule readout of oxidation activity and sequence and show pronounced 20 and 70-fold flanking sequence effects on the catalytic activities of TET1 and TET2, respectively. Flanking sequence preferences were similar for TET1 and TET2 and also for 5mC and 5hmC substrates. Enhanced flanking sequence preferences were observed at non-CpG sites together with profound effects of flanking sequences on the specificity of TET2. TET flanking sequence preferences are reflected in genome-wide and local patterns of 5hmC and DNA demethylation in human and mouse cells indicating that they influence genomic DNA modification patterns in combination with locus specific targeting of TET enzymes.


Subject(s)
5-Methylcytosine/analogs & derivatives , DNA-Binding Proteins/metabolism , Dioxygenases/metabolism , Gene Expression Regulation/physiology , Proto-Oncogene Proteins/metabolism , 5-Methylcytosine/metabolism , Animals , Base Sequence , Chromatography, High Pressure Liquid , Cloning, Molecular , Computational Biology , DNA-Binding Proteins/genetics , Dioxygenases/genetics , Genomics , Mice , Proto-Oncogene Proteins/genetics , Tandem Mass Spectrometry
SELECTION OF CITATIONS
SEARCH DETAIL