RESUMO
SUMMARY: Mechanistic models are important tools to describe and understand biological processes. However, they typically rely on unknown parameters, the estimation of which can be challenging for large and complex systems. pyPESTO is a modular framework for systematic parameter estimation, with scalable algorithms for optimization and uncertainty quantification. While tailored to ordinary differential equation problems, pyPESTO is broadly applicable to black-box parameter estimation problems. Besides own implementations, it provides a unified interface to various popular simulation and inference methods. AVAILABILITY AND IMPLEMENTATION: pyPESTO is implemented in Python, open-source under a 3-Clause BSD license. Code and documentation are available on GitHub (https://github.com/icb-dcm/pypesto).
Assuntos
Algoritmos , Software , Simulação por Computador , Incerteza , Documentação , Modelos BiológicosRESUMO
MOTIVATION: Unknown parameters of dynamical models are commonly estimated from experimental data. However, while various efficient optimization and uncertainty analysis methods have been proposed for quantitative data, methods for qualitative data are rare and suffer from bad scaling and convergence. RESULTS: Here, we propose an efficient and reliable framework for estimating the parameters of ordinary differential equation models from qualitative data. In this framework, we derive a semi-analytical algorithm for gradient calculation of the optimal scaling method developed for qualitative data. This enables the use of efficient gradient-based optimization algorithms. We demonstrate that the use of gradient information improves performance of optimization and uncertainty quantification on several application examples. On average, we achieve a speedup of more than one order of magnitude compared to gradient-free optimization. In addition, in some examples, the gradient-based approach yields substantially improved objective function values and quality of the fits. Accordingly, the proposed framework substantially improves the parameterization of models from qualitative data. AVAILABILITY AND IMPLEMENTATION: The proposed approach is implemented in the open-source Python Parameter EStimation TOolbox (pyPESTO). pyPESTO is available at https://github.com/ICB-DCM/pyPESTO. All application examples and code to reproduce this study are available at https://doi.org/10.5281/zenodo.4507613. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Modelos Biológicos , Software , Algoritmos , Incerteza , Projetos de PesquisaRESUMO
We need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 molecular mechanisms. The COVID-19 Disease Map (C19DMap) is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. Notably, it is a computational resource for graph-based analyses and disease modelling. To this end, we established a framework of tools, platforms and guidelines necessary for a multifaceted community of biocurators, domain experts, bioinformaticians and computational biologists. The diagrams of the C19DMap, curated from the literature, are integrated with relevant interaction and text mining databases. We demonstrate the application of network analysis and modelling approaches by concrete examples to highlight new testable hypotheses. This framework helps to find signatures of SARS-CoV-2 predisposition, treatment response or prioritisation of drug candidates. Such an approach may help deal with new waves of COVID-19 or similar pandemics in the long-term perspective.
Assuntos
COVID-19/imunologia , Biologia Computacional/métodos , Bases de Dados Factuais , SARS-CoV-2/imunologia , Software , Antivirais/uso terapêutico , COVID-19/genética , COVID-19/virologia , Gráficos por Computador , Citocinas/genética , Citocinas/imunologia , Mineração de Dados/estatística & dados numéricos , Regulação da Expressão Gênica , Interações entre Hospedeiro e Microrganismos/genética , Interações entre Hospedeiro e Microrganismos/imunologia , Humanos , Imunidade Celular/efeitos dos fármacos , Imunidade Humoral/efeitos dos fármacos , Imunidade Inata/efeitos dos fármacos , Linfócitos/efeitos dos fármacos , Linfócitos/imunologia , Linfócitos/virologia , Redes e Vias Metabólicas/genética , Redes e Vias Metabólicas/imunologia , Células Mieloides/efeitos dos fármacos , Células Mieloides/imunologia , Células Mieloides/virologia , Mapeamento de Interação de Proteínas , SARS-CoV-2/efeitos dos fármacos , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade , Transdução de Sinais , Fatores de Transcrição/genética , Fatores de Transcrição/imunologia , Proteínas Virais/genética , Proteínas Virais/imunologia , Tratamento Farmacológico da COVID-19RESUMO
Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been-so far-no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated. We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and currently 20 example parameter estimation problems based on recent studies.
Assuntos
Linguagens de Programação , Biologia de Sistemas/métodos , Algoritmos , Bases de Dados Factuais , Modelos Biológicos , Modelos Estatísticos , Reprodutibilidade dos TestesRESUMO
MOTIVATION: Mechanistic models of biochemical reaction networks facilitate the quantitative understanding of biological processes and the integration of heterogeneous datasets. However, some biological processes require the consideration of comprehensive reaction networks and therefore large-scale models. Parameter estimation for such models poses great challenges, in particular when the data are on a relative scale. RESULTS: Here, we propose a novel hierarchical approach combining (i) the efficient analytic evaluation of optimal scaling, offset and error model parameters with (ii) the scalable evaluation of objective function gradients using adjoint sensitivity analysis. We evaluate the properties of the methods by parameterizing a pan-cancer ordinary differential equation model (>1000 state variables, >4000 parameters) using relative protein, phosphoprotein and viability measurements. The hierarchical formulation improves optimizer performance considerably. Furthermore, we show that this approach allows estimating error model parameters with negligible computational overhead when no experimental estimates are available, providing an unbiased way to weight heterogeneous data. Overall, our hierarchical formulation is applicable to a wide range of models, and allows for the efficient parameterization of large-scale models based on heterogeneous relative measurements. AVAILABILITY AND IMPLEMENTATION: Supplementary code and data are available online at http://doi.org/10.5281/zenodo.3254429 and http://doi.org/10.5281/zenodo.3254441. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Modelos Biológicos , Software , Algoritmos , Projetos de PesquisaRESUMO
Quantitative dynamical models facilitate the understanding of biological processes and the prediction of their dynamics. These models usually comprise unknown parameters, which have to be inferred from experimental data. For quantitative experimental data, there are several methods and software tools available. However, for qualitative data the available approaches are limited and computationally demanding. Here, we consider the optimal scaling method which has been developed in statistics for categorical data and has been applied to dynamical systems. This approach turns qualitative variables into quantitative ones, accounting for constraints on their relation. We derive a reduced formulation for the optimization problem defining the optimal scaling. The reduced formulation possesses the same optimal points as the established formulation but requires less degrees of freedom. Parameter estimation for dynamical models of cellular pathways revealed that the reduced formulation improves the robustness and convergence of optimizers. This resulted in substantially reduced computation times. We implemented the proposed approach in the open-source Python Parameter EStimation TOolbox (pyPESTO) to facilitate reuse and extension. The proposed approach enables efficient parameterization of quantitative dynamical models using qualitative data.
Assuntos
Modelos Biológicos , Software , Algoritmos , Fenômenos Fisiológicos CelularesRESUMO
PURPOSE: Development of a computational biomarker to predict, prior to treatment, the response to CDK4/6 inhibition (CDK4/6i) in combination with endocrine therapy in patients with breast cancer. EXPERIMENTAL DESIGN: A mechanistic mathematical model that accounts for protein signaling and drug mechanisms of action was developed and trained on extensive, publicly available data from breast cancer cell lines. The model was built to provide a patient-specific response score based on the expression of six genes (CCND1, CCNE1, ESR1, RB1, MYC, and CDKN1A). The model was validated in five independent cohorts of 148 patients in total with early-stage or advanced breast cancer treated with endocrine therapy and CDK4/6i. Response was measured either by evaluating Ki67 levels and PAM50 risk of relapse (ROR) after neoadjuvant treatment or by evaluating progression-free survival (PFS). RESULTS: The model showed significant association with patient's outcomes in all five cohorts. The model predicted high Ki67 [area under the curve; AUC (95% confidence interval, CI) of 0.80 (0.64-0.92), 0.81 (0.60-1.00) and 0.80 (0.65-0.93)] and high PAM50 ROR [AUC of 0.78 (0.64-0.89)]. This observation was not obtained in patients treated with chemotherapy. In the other cohorts, patient stratification based on the model prediction was significantly associated with PFS [hazard ratio (HR) = 2.92 (95% CI, 1.08-7.86), P = 0.034 and HR = 2.16 (1.02 4.55), P = 0.043]. CONCLUSIONS: A mathematical modeling approach accurately predicts patient outcome following CDK4/6i plus endocrine therapy that marks a step toward more personalized treatments in patients with Luminal B breast cancer.
Assuntos
Neoplasias da Mama , Quinase 4 Dependente de Ciclina , Quinase 6 Dependente de Ciclina , Humanos , Feminino , Quinase 4 Dependente de Ciclina/antagonistas & inibidores , Quinase 4 Dependente de Ciclina/genética , Quinase 6 Dependente de Ciclina/antagonistas & inibidores , Quinase 6 Dependente de Ciclina/genética , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/patologia , Neoplasias da Mama/genética , Neoplasias da Mama/mortalidade , Neoplasias da Mama/metabolismo , Pessoa de Meia-Idade , Biomarcadores Tumorais/genética , Inibidores de Proteínas Quinases/uso terapêutico , Inibidores de Proteínas Quinases/farmacologia , Idoso , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Adulto , Prognóstico , Simulação por Computador , Modelos Teóricos , Antígeno Ki-67/metabolismo , Antineoplásicos Hormonais/uso terapêutico , Antineoplásicos Hormonais/farmacologiaRESUMO
Tumor heterogeneity is an important driver of treatment failure in cancer since therapies often select for drug-tolerant or drug-resistant cellular subpopulations that drive tumor growth and recurrence. Profiling the drug-response heterogeneity of tumor samples using traditional genomic deconvolution methods has yielded limited results, due in part to the imperfect mapping between genomic variation and functional characteristics. Here, we leverage mechanistic population modeling to develop a statistical framework for profiling phenotypic heterogeneity from standard drug-screen data on bulk tumor samples. This method, called PhenoPop, reliably identifies tumor subpopulations exhibiting differential drug responses and estimates their drug sensitivities and frequencies within the bulk population. We apply PhenoPop to synthetically generated cell populations, mixed cell-line experiments, and multiple myeloma patient samples and demonstrate how it can provide individualized predictions of tumor growth under candidate therapies. This methodology can also be applied to deconvolution problems in a variety of biological settings beyond cancer drug response.
Assuntos
Antineoplásicos , Neoplasias , Humanos , Detecção Precoce de Câncer , Neoplasias/tratamento farmacológico , Antineoplásicos/farmacologia , Linhagem Celular , GenômicaRESUMO
Quantitative dynamic models are widely used to study cellular signal processing. A critical step in modelling is the estimation of unknown model parameters from experimental data. As model sizes and datasets are steadily growing, established parameter optimization approaches for mechanistic models become computationally extremely challenging. Mini-batch optimization methods, as employed in deep learning, have better scaling properties. In this work, we adapt, apply, and benchmark mini-batch optimization for ordinary differential equation (ODE) models, thereby establishing a direct link between dynamic modelling and machine learning. On our main application example, a large-scale model of cancer signaling, we benchmark mini-batch optimization against established methods, achieving better optimization results and reducing computation by more than an order of magnitude. We expect that our work will serve as a first step towards mini-batch optimization tailored to ODE models and enable modelling of even larger and more complex systems than what is currently possible.
Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Algoritmos , Benchmarking , Linhagem Celular Tumoral , Técnicas de Inativação de Genes , Humanos , Modelos Biológicos , Neoplasias , Transdução de Sinais , SoftwareRESUMO
Ordinary differential equation (ODE) models are a key tool to understand complex mechanisms in systems biology. These models are studied using various approaches, including stability and bifurcation analysis, but most frequently by numerical simulations. The number of required simulations is often large, e.g., when unknown parameters need to be inferred. This renders efficient and reliable numerical integration methods essential. However, these methods depend on various hyperparameters, which strongly impact the ODE solution. Despite this, and although hundreds of published ODE models are freely available in public databases, a thorough study that quantifies the impact of hyperparameters on the ODE solver in terms of accuracy and computation time is still missing. In this manuscript, we investigate which choices of algorithms and hyperparameters are generally favorable when dealing with ODE models arising from biological processes. To ensure a representative evaluation, we considered 142 published models. Our study provides evidence that most ODEs in computational biology are stiff, and we give guidelines for the choice of algorithms and hyperparameters. We anticipate that our results will help researchers in systems biology to choose appropriate numerical methods when dealing with ODE models.
RESUMO
Survival or apoptosis is a binary decision in individual cells. However, at the cell-population level, a graded increase in survival of colony-forming unit-erythroid (CFU-E) cells is observed upon stimulation with erythropoietin (Epo). To identify components of Janus kinase 2/signal transducer and activator of transcription 5 (JAK2/STAT5) signal transduction that contribute to the graded population response, we extended a cell-population-level model calibrated with experimental data to study the behavior in single cells. The single-cell model shows that the high cell-to-cell variability in nuclear phosphorylated STAT5 is caused by variability in the amount of Epo receptor (EpoR):JAK2 complexes and of SHP1, as well as the extent of nuclear import because of the large variance in the cytoplasmic volume of CFU-E cells. 24-118 pSTAT5 molecules in the nucleus for 120 min are sufficient to ensure cell survival. Thus, variability in membrane-associated processes is sufficient to convert a switch-like behavior at the single-cell level to a graded population-level response.
Assuntos
Citoplasma/metabolismo , Células Precursoras Eritroides/citologia , Células Precursoras Eritroides/metabolismo , Janus Quinase 2/metabolismo , Fator de Transcrição STAT5/metabolismo , Transdução de Sinais , Animais , Calibragem , Núcleo Celular/efeitos dos fármacos , Núcleo Celular/metabolismo , Sobrevivência Celular/efeitos dos fármacos , Células Cultivadas , Simulação por Computador , Eritropoetina/farmacologia , Camundongos Endogâmicos BALB C , Modelos Biológicos , Fosforilação/efeitos dos fármacos , Transdução de Sinais/efeitos dos fármacosRESUMO
Mechanistic models are essential to deepen the understanding of complex diseases at the molecular level. Nowadays, high-throughput molecular and phenotypic characterizations are possible, but the integration of such data with prior knowledge on signaling pathways is limited by the availability of scalable computational methods. Here, we present a computational framework for the parameterization of large-scale mechanistic models and its application to the prediction of drug response of cancer cell lines from exome and transcriptome sequencing data. This framework is over 104 times faster than state-of-the-art methods, which enables modeling at previously infeasible scales. By applying the framework to a model describing major cancer-associated pathways (>1,200 species and >2,600 reactions), we could predict the effect of drug combinations from single drug data. This is the first integration of high-throughput datasets using large-scale mechanistic models. We anticipate this to be the starting point for development of more comprehensive models allowing a deeper mechanistic insight.