Search | VHL Regional Portal

DataXflow: Synergizing data-driven modeling with best parameter fit and optimal control - An efficient data analysis for cancer research.

Crouch, Samantha A W; Krause, Jan; Dandekar, Thomas; Breitenbach, Tim.

Comput Struct Biotechnol J ; 23: 1755-1772, 2024 Dec.

Article in English | MEDLINE | ID: mdl-38707537

ABSTRACT

Building data-driven models is an effective strategy for information extraction from empirical data. Adapting model parameters specifically to data with a best fitting approach encodes the relevant information into a mathematical model. Subsequently, an optimal control framework extracts the most efficient targets to steer the model into desired changes via external stimuli. The DataXflow software framework integrates three software pipelines, D2D for model fitting, a framework solving optimal control problems including external stimuli and JimenaE providing graphical user interfaces to employ the other frameworks lowering the barriers for the need of programming skills, and simultaneously automating reoccurring modeling tasks. Such tasks include equation generation from a graph and script generation allowing also to approach systems with many agents, like complex gene regulatory networks. A desired state of the model is defined, and therapeutic interventions are modeled as external stimuli. The optimal control framework purposefully exploits the model-encoded information by providing those external stimuli that effect the desired changes most efficiently. The implementation of DataXflow is available under https://github.com/MarvelousHopefull/DataXflow. We showcase its application by detecting specific drug targets for a therapy of lung cancer from measurement data to lower proliferation and increase apoptosis. By an iterative modeling process refining the topology of the model, the regulatory network of the tumor is generated from the data. An application of the optimal control framework in our example reveals the inhibition of AURKA and the activation of CDH1 as the most efficient drug target combination. DataXflow paves the way to an agile interplay between data generation and its analysis potentially accelerating cancer research by an efficient drug target identification, even in complex networks.

An orchestra of machine learning methods reveals landmarks in single-cell data exemplified with aging fibroblasts.

Rasbach, Lauritz; Caliskan, Aylin; Saderi, Fatemeh; Dandekar, Thomas; Breitenbach, Tim.

PLoS One ; 19(4): e0302045, 2024.

Article in English | MEDLINE | ID: mdl-38630692

ABSTRACT

In this work, a Python framework for characteristic feature extraction is developed and applied to gene expression data of human fibroblasts. Unlabeled feature selection objectively determines groups and minimal gene sets separating groups. ML explainability methods transform the features correlating with phenotypic differences into causal reasoning, supported by further pipeline and visualization tools, allowing user knowledge to boost causal reasoning. The purpose of the framework is to identify characteristic features that are causally related to phenotypic differences of single cells. The pipeline consists of several data science methods enriched with purposeful visualization of the intermediate results in order to check them systematically and infuse the domain knowledge about the investigated process. A specific focus is to extract a small but meaningful set of genes to facilitate causal reasoning for the phenotypic differences. One application could be drug target identification. For this purpose, the framework follows different steps: feature reduction (PFA), low dimensional embedding (UMAP), clustering ((H)DBSCAN), feature correlation (chi-square, mutual information), ML validation and explainability (SHAP, tree explainer). The pipeline is validated by identifying and correctly separating signature genes associated with aging in fibroblasts from single-cell gene expression measurements: PLK3, polo-like protein kinase 3; CCDC88A, Coiled-Coil Domain Containing 88A; STAT3, signal transducer and activator of transcription-3; ZNF7, Zinc Finger Protein 7; SLC24A2, solute carrier family 24 member 2 and lncRNA RP11-372K14.2. The code for the preprocessing step can be found in the GitHub repository https://github.com/AC-PHD/NoLabelPFA, along with the characteristic feature extraction https://github.com/LauritzR/characteristic-feature-extraction.

Subject(s)

Aging , Machine Learning , Humans , Microfilament Proteins , Vesicular Transport Proteins

Optimized cell type signatures revealed from single-cell data by combining principal feature analysis, mutual information, and machine learning.

Caliskan, Aylin; Caliskan, Deniz; Rasbach, Lauritz; Yu, Weimeng; Dandekar, Thomas; Breitenbach, Tim.

Comput Struct Biotechnol J ; 21: 3293-3314, 2023.

Article in English | MEDLINE | ID: mdl-37333862

ABSTRACT

Machine learning techniques are excellent to analyze expression data from single cells. These techniques impact all fields ranging from cell annotation and clustering to signature identification. The presented framework evaluates gene selection sets how far they optimally separate defined phenotypes or cell groups. This innovation overcomes the present limitation to objectively and correctly identify a small gene set of high information content regarding separating phenotypes for which corresponding code scripts are provided. The small but meaningful subset of the original genes (or feature space) facilitates human interpretability of the differences of the phenotypes including those found by machine learning results and may even turn correlations between genes and phenotypes into a causal explanation. For the feature selection task, the principal feature analysis is utilized which reduces redundant information while selecting genes that carry the information for separating the phenotypes. In this context, the presented framework shows explainability of unsupervised learning as it reveals cell-type specific signatures. Apart from a Seurat preprocessing tool and the PFA script, the pipeline uses mutual information to balance accuracy and size of the gene set if desired. A validation part to evaluate the gene selection for their information content regarding the separation of the phenotypes is provided as well, binary and multiclass classification of 3 or 4 groups are studied. Results from different single-cell data are presented. In each, only about ten out of more than 30000 genes are identified as carrying the relevant information. The code is provided in a GitHub repository at https://github.com/AC-PHD/Seurat_PFA_pipeline.

Software JimenaE allows efficient dynamic simulations of Boolean networks, centrality and system state analysis.

Kaltdorf, Martin; Breitenbach, Tim; Karl, Stefan; Fuchs, Maximilian; Kessie, David Komla; Psota, Eric; Prelog, Martina; Sarukhanyan, Edita; Ebert, Regina; Jakob, Franz; Dandekar, Gudrun; Naseem, Muhammad; Liang, Chunguang; Dandekar, Thomas.

Sci Rep ; 13(1): 1855, 2023 02 01.

Article in English | MEDLINE | ID: mdl-36725967

ABSTRACT

The signal modelling framework JimenaE simulates dynamically Boolean networks. In contrast to SQUAD, there is systematic and not just heuristic calculation of all system states. These specific features are not present in CellNetAnalyzer and BoolNet. JimenaE is an expert extension of Jimena, with new optimized code, network conversion into different formats, rapid convergence both for system state calculation as well as for all three network centralities. It allows higher accuracy in determining network states and allows to dissect networks and identification of network control type and amount for each protein with high accuracy. Biological examples demonstrate this: (i) High plasticity of mesenchymal stromal cells for differentiation into chondrocytes, osteoblasts and adipocytes and differentiation-specific network control focusses on wnt-, TGF-beta and PPAR-gamma signaling. JimenaE allows to study individual proteins, removal or adding interactions (or autocrine loops) and accurately quantifies effects as well as number of system states. (ii) Dynamical modelling of cell-cell interactions of plant Arapidopsis thaliana against Pseudomonas syringae DC3000: We analyze for the first time the pathogen perspective and its interaction with the host. We next provide a detailed analysis on how plant hormonal regulation stimulates specific proteins and who and which protein has which type and amount of network control including a detailed heatmap of the A.thaliana response distinguishing between two states of the immune response. (iii) In an immune response network of dendritic cells confronted with Aspergillus fumigatus, JimenaE calculates now accurately the specific values for centralities and protein-specific network control including chemokine and pattern recognition receptors.

Subject(s)

Proteins , Software , Signal Transduction , Cell Communication , Cell Differentiation

Optimization of synthetic molecular reporters for a mesenchymal glioblastoma transcriptional program by integer programing.

Breitenbach, Tim; Schmitt, Matthias Jürgen; Dandekar, Thomas.

Bioinformatics ; 38(17): 4162-4171, 2022 09 02.

Article in English | MEDLINE | ID: mdl-35809064

ABSTRACT

MOTIVATION: A recent approach to perform genetic tracing of complex biological problems involves the generation of synthetic deoxyribonucleic acid (DNA) probes that specifically mark cells with a phenotype of interest. These synthetic locus control regions (sLCRs), in turn, drive the expression of a reporter gene, such as fluorescent protein. To build functional and specific sLCRs, it is critical to accurately select multiple bona fide cis-regulatory elements from the target cell phenotype cistrome. This selection occurs by maximizing the number and diversity of transcription factors (TFs) within the sLCR, yet the size of the final sLCR should remain limited. RESULTS: In this work, we discuss how optimization, in particular integer programing, can be used to systematically address the construction of a specific sLCR and optimize pre-defined properties of the sLCR. Our presented instance of a linear optimization problem maximizes the activation potential of the sLCR such that its size is limited to a pre-defined length and a minimum number of all TFs deemed sufficiently characteristic for the phenotype of interest is covered. We generated an sLCR to trace the mesenchymal glioblastoma program in patients by solving our corresponding linear program with the software optimizer Gurobi. Considering the binding strength of transcription factor binding sites (TFBSs) with their TFs as a proxy for activation potential, the optimized sLCR scores similarly to an sLCR experimentally validated in vivo, and is smaller in size while having the same coverage of TFBSs. AVAILABILITY AND IMPLEMENTATION: We provide a Python implementation of the presented framework in the Supplementary Material with which an optimal selection of cis-regulatory elements can be calculated once the target set of TFs and their binding strength with their TFBSs is known. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Glioblastoma , Humans , Binding Sites/genetics , Glioblastoma/genetics , Transcription Factors/metabolism , Protein Binding , Regulatory Sequences, Nucleic Acid

A modular systems biological modelling framework studies cyclic nucleotide signaling in platelets.

Breitenbach, Tim; Englert, Nils; Osmanoglu, Özge; Rukoyatkina, Natalia; Wangorsch, Gaby; Heinze, Katrin; Friebe, Andreas; Butt, Elke; Feil, Robert; Dittrich, Marcus; Gambaryan, Stepan; Dandekar, Thomas.

J Theor Biol ; 550: 111222, 2022 10 07.

Article in English | MEDLINE | ID: mdl-35843440

ABSTRACT

BACKGROUND: The cyclic nucleotides cAMP and cGMP inhibit platelet activation. Different platelet signaling modules work together. We develop here a modelling framework to integrate different signaling modules and apply it to platelets. RESULTS: We introduce a novel standardized bilinear coupling mechanism allowing sub model debugging and standardization of coupling with optimal data driven modelling by methods from optimization. Besides cAMP signaling our model considers specific cGMP effects including external stimuli by drugs. Moreover, the output of the cGMP module serves as input for a modular model of VASP phosphorylation and for the activity of cAMP and cGMP pathways in platelets. Experimental data driven modeling allows us to design models with quantitative output. We use the condensed information about involved regulation and system responses for modeling drug effects and obtaining optimal experimental settings. Stepwise further validation of our model is given by direct experimental data. CONCLUSIONS: We present a general framework for model integration using modules and their stimulus responses. We demonstrate it by a multi-modular model for platelet signaling focusing on cGMP and VASP phosphorylation. Moreover, this allows to estimate drug action on any of the inhibitory cyclic nucleotide pathways (cGMP, cAMP) and is supported by experimental data.

Subject(s)

Blood Platelets , Cyclic AMP , Cyclic GMP , Nucleotides, Cyclic , Phosphoproteins , Phosphorylation

An effective model of endogenous clocks and external stimuli determining circadian rhythms.

Breitenbach, Tim; Helfrich-Förster, Charlotte; Dandekar, Thomas.

Sci Rep ; 11(1): 16165, 2021 08 09.

Article in English | MEDLINE | ID: mdl-34373483

ABSTRACT

Circadian endogenous clocks of eukaryotic organisms are an established and rapidly developing research field. To investigate and simulate in an effective model the effect of external stimuli on such clocks and their components we developed a software framework for download and simulation. The application is useful to understand the different involved effects in a mathematical simple and effective model. This concerns the effects of Zeitgebers, feedback loops and further modifying components. We start from a known mathematical oscillator model, which is based on experimental molecular findings. This is extended with an effective framework that includes the impact of external stimuli on the circadian oscillations including high dose pharmacological treatment. In particular, the external stimuli framework defines a systematic procedure by input-output-interfaces to couple different oscillators. The framework is validated by providing phase response curves and ranges of entrainment. Furthermore, Aschoffs rule is computationally investigated. It is shown how the external stimuli framework can be used to study biological effects like points of singularity or oscillators integrating different signals at once. The mathematical framework and formalism is generic and allows to study in general the effect of external stimuli on oscillators and other biological processes. For an easy replication of each numerical experiment presented in this work and an easy implementation of the framework the corresponding Mathematica files are fully made available. They can be downloaded at the following link: https://www.biozentrum.uni-wuerzburg.de/bioinfo/computing/circadian/ .

Subject(s)

Circadian Clocks/physiology , Circadian Rhythm/physiology , Models, Biological , Animals , Circadian Clocks/genetics , Circadian Rhythm/genetics , Computer Simulation , Drosophila melanogaster/genetics , Drosophila melanogaster/physiology , Food , Light , Mice , Photic Stimulation , Software , Suprachiasmatic Nucleus/physiology

Analyzing pharmacological intervention points: A method to calculate external stimuli to switch between steady states in regulatory networks.

Breitenbach, Tim; Liang, Chunguang; Beyersdorf, Niklas; Dandekar, Thomas.

PLoS Comput Biol ; 15(7): e1007075, 2019 07.

Article in English | MEDLINE | ID: mdl-31310618

ABSTRACT

Once biological systems are modeled by regulatory networks, the next step is to include external stimuli, which model the experimental possibilities to affect the activity level of certain network's nodes, in a mathematical framework. Then, this framework can be interpreted as a mathematical optimal control framework such that optimization algorithms can be used to determine external stimuli which cause a desired switch from an initial state of the network to another final state. These external stimuli are the intervention points for the corresponding biological experiment to obtain the desired outcome of the considered experiment. In this work, the model of regulatory networks is extended to controlled regulatory networks. For this purpose, external stimuli are considered which can affect the activity of the network's nodes by activation or inhibition. A method is presented how to calculate a selection of external stimuli which causes a switch between two different steady states of a regulatory network. A software solution based on Jimena and Mathworks Matlab is provided. Furthermore, numerical examples are presented to demonstrate application and scope of the software on networks of 4 nodes, 11 nodes and 36 nodes. Moreover, we analyze the aggregation of platelets and the behavior of a basic T-helper cell protein-protein interaction network and its maturation towards Th0, Th1, Th2, Th17 and Treg cells in accordance with experimental data.

Subject(s)

Computer Simulation , Models, Biological , Protein Interaction Maps , T-Lymphocytes, Helper-Inducer/metabolism

How to Steer and Control ERK and the ERK Signaling Cascade Exemplified by Looking at Cardiac Insufficiency.

Breitenbach, Tim; Lorenz, Kristina; Dandekar, Thomas.

Int J Mol Sci ; 20(9)2019 May 02.

Article in English | MEDLINE | ID: mdl-31052520

ABSTRACT

Mathematical optimization framework allows the identification of certain nodes within a signaling network. In this work, we analyzed the complex extracellular-signal-regulated kinase 1 and 2 (ERK1/2) cascade in cardiomyocytes using the framework to find efficient adjustment screws for this cascade that is important for cardiomyocyte survival and maladaptive heart muscle growth. We modeled optimal pharmacological intervention points that are beneficial for the heart, but avoid the occurrence of a maladaptive ERK1/2 modification, the autophosphorylation of ERK at threonine 188 (ERK Thr 188 phosphorylation), which causes cardiac hypertrophy. For this purpose, a network of a cardiomyocyte that was fitted to experimental data was equipped with external stimuli that model the pharmacological intervention points. Specifically, two situations were considered. In the first one, the cardiomyocyte was driven to a desired expression level with different treatment strategies. These strategies were quantified with respect to beneficial effects and maleficent side effects and then which one is the best treatment strategy was evaluated. In the second situation, it was shown how to model constitutively activated pathways and how to identify drug targets to obtain a desired activity level that is associated with a healthy state and in contrast to the maleficent expression pattern caused by the constitutively activated pathway. An implementation of the algorithms used for the calculations is also presented in this paper, which simplifies the application of the presented framework for drug targeting, optimal drug combinations and the systematic and automatic search for pharmacological intervention points. The codes were designed such that they can be combined with any mathematical model given by ordinary differential equations.

Subject(s)

Cardiomegaly/drug therapy , MAP Kinase Signaling System/drug effects , Myocytes, Cardiac/drug effects , Myocytes, Cardiac/metabolism , Algorithms , Cardiomegaly/metabolism , Humans , Mitogen-Activated Protein Kinase 1/metabolism , Mitogen-Activated Protein Kinase 3/metabolism , Models, Cardiovascular , Molecular Targeted Therapy/methods , Myocytes, Cardiac/pathology , Phosphorylation/drug effects

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL