RESUMEN
Likelihood ratios are frequently utilized as basis for statistical tests, for model selection criteria and for assessing parameter and prediction uncertainties, e.g. using the profile likelihood. However, translating these likelihood ratios into p-values or confidence intervals requires the exact form of the test statistic's distribution. The lack of knowledge about this distribution for nonlinear ordinary differential equation (ODE) models requires an approximation which assumes the so-called asymptotic setting, i.e. a sufficiently large amount of data. Since the amount of data from quantitative molecular biology is typically limited in applications, this finite-sample case regularly occurs for mechanistic models of dynamical systems, e.g. biochemical reaction networks or infectious disease models. Thus, it is unclear whether the standard approach of using statistical thresholds derived for the asymptotic large-sample setting in realistic applications results in valid conclusions. In this study, empirical likelihood ratios for parameters from 19 published nonlinear ODE benchmark models are investigated using a resampling approach for the original data designs. Their distributions are compared to the asymptotic approximation and statistical thresholds are checked for conservativeness. It turns out, that corrections of the likelihood ratios in such finite-sample applications are required in order to avoid anti-conservative results.
Asunto(s)
Algoritmos , Dinámicas no Lineales , Funciones de Verosimilitud , IncertidumbreRESUMEN
MOTIVATION: A major goal of drug development is to selectively target certain cell types. Cellular decisions influenced by drugs are often dependent on the dynamic processing of information. Selective responses can be achieved by differences between the involved cell types at levels of receptor, signaling, gene regulation or further downstream. Therefore, a systematic approach to detect and quantify cell type-specific parameters in dynamical systems becomes necessary. RESULTS: Here, we demonstrate that a combination of nonlinear modeling with L1 regularization is capable of detecting cell type-specific parameters. To adapt the least-squares numerical optimization routine to L1 regularization, sub-gradient strategies as well as truncation of proposed optimization steps were implemented. Likelihood-ratio tests were used to determine the optimal regularization strength resulting in a sparse solution in terms of a minimal number of cell type-specific parameters that is in agreement with the data. By applying our implementation to a realistic dynamical benchmark model of the DREAM6 challenge we were able to recover parameter differences with an accuracy of 78%. Within the subset of detected differences, 91% were in agreement with their true value. Furthermore, we found that the results could be improved using the profile likelihood. In conclusion, the approach constitutes a general method to infer an overarching model with a minimum number of individual parameters for the particular models. AVAILABILITY AND IMPLEMENTATION: A MATLAB implementation is provided within the freely available, open-source modeling environment Data2Dynamics. Source code for all examples is provided online at http://www.data2dynamics.org/ CONTACT: bernhard.steiert@fdm.uni-freiburg.de.
Asunto(s)
Células/clasificación , Sistemas de Liberación de Medicamentos , Dinámicas no Lineales , Algoritmos , Análisis de los Mínimos Cuadrados , Probabilidad , Lenguajes de Programación , Transducción de SeñalRESUMEN
Lung cancer, with its most prevalent form non-small-cell lung carcinoma (NSCLC), is one of the leading causes of cancer-related deaths worldwide, and is commonly treated with chemotherapeutic drugs such as cisplatin. Lung cancer patients frequently suffer from chemotherapy-induced anemia, which can be treated with erythropoietin (EPO). However, studies have indicated that EPO not only promotes erythropoiesis in hematopoietic cells, but may also enhance survival of NSCLC cells. Here, we verified that the NSCLC cell line H838 expresses functional erythropoietin receptors (EPOR) and that treatment with EPO reduces cisplatin-induced apoptosis. To pinpoint differences in EPO-induced survival signaling in erythroid progenitor cells (CFU-E, colony forming unit-erythroid) and H838 cells, we combined mathematical modeling with a method for feature selection, the L1 regularization. Utilizing an example model and simulated data, we demonstrated that this approach enables the accurate identification and quantification of cell type-specific parameters. We applied our strategy to quantitative time-resolved data of EPO-induced JAK/STAT signaling generated by quantitative immunoblotting, mass spectrometry and quantitative real-time PCR (qRT-PCR) in CFU-E and H838 cells as well as H838 cells overexpressing human EPOR (H838-HA-hEPOR). The established parsimonious mathematical model was able to simultaneously describe the data sets of CFU-E, H838 and H838-HA-hEPOR cells. Seven cell type-specific parameters were identified that included for example parameters for nuclear translocation of STAT5 and target gene induction. Cell type-specific differences in target gene induction were experimentally validated by qRT-PCR experiments. The systematic identification of pathway differences and sensitivities of EPOR signaling in CFU-E and H838 cells revealed potential targets for intervention to selectively inhibit EPO-induced signaling in the tumor cells but leave the responses in erythroid progenitor cells unaffected. Thus, the proposed modeling strategy can be employed as a general procedure to identify cell type-specific parameters and to recommend treatment strategies for the selective targeting of specific cell types.
Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas/metabolismo , Células Eritroides/metabolismo , Neoplasias Pulmonares/metabolismo , Receptores de Eritropoyetina , Transducción de Señal/fisiología , Carcinoma de Pulmón de Células no Pequeñas/genética , Línea Celular Tumoral , Biología Computacional , Células Eritroides/citología , Humanos , Neoplasias Pulmonares/genética , Receptores de Eritropoyetina/análisis , Receptores de Eritropoyetina/clasificación , Receptores de Eritropoyetina/genética , Receptores de Eritropoyetina/metabolismoRESUMEN
Whole-cell models that explicitly represent all cellular components at the molecular level have the potential to predict phenotype from genotype. However, even for simple bacteria, whole-cell models will contain thousands of parameters, many of which are poorly characterized or unknown. New algorithms are needed to estimate these parameters and enable researchers to build increasingly comprehensive models. We organized the Dialogue for Reverse Engineering Assessments and Methods (DREAM) 8 Whole-Cell Parameter Estimation Challenge to develop new parameter estimation algorithms for whole-cell models. We asked participants to identify a subset of parameters of a whole-cell model given the model's structure and in silico "experimental" data. Here we describe the challenge, the best performing methods, and new insights into the identifiability of whole-cell models. We also describe several valuable lessons we learned toward improving future challenges. Going forward, we believe that collaborative efforts supported by inexpensive cloud computing have the potential to solve whole-cell model parameter estimation.
Asunto(s)
Células/metabolismo , Modelos Biológicos , Algoritmos , Bacterias/genética , Bacterias/metabolismo , Bioingeniería , Nube Computacional , Biología Computacional , Simulación por Computador , Estudios de Asociación Genética/estadística & datos numéricos , Mutación , Mycoplasma genitalium/genética , Mycoplasma genitalium/metabolismoRESUMEN
Bayesian historical borrowing has recently attracted growing interest due to the increasing availability of historical control data, as well as improved computational methodology and software. In this article, we argue that the statistical models used for borrowing may be suboptimal when they do not adjust for differing factors across historical studies such as covariates, dosing regimen, etc. We propose an alternative approach to address these shortcomings. We start by constructing a historical model based on subject-level historical data to accurately characterize the control treatment by adjusting for known between trials differences. This model is subsequently used to predict the control arm response in the current trial, enabling the derivation of a model-informed prior for the treatment effect parameter of another (potentially simpler) model used to analyze the trial efficacy (i.e. the trial model). Our approach is applied to neovascular age-related macular degeneration trials, employing a cross-sectional regression trial model, and a longitudinal non-linear mixed-effects drug-disease-trial historical model. The latter model characterizes the relationship between clinical response, drug exposure and baseline covariates so that the derived model-informed prior seamlessly adapts to the trial population and can be extrapolated to a different dosing regimen. This approach can yield a more accurate prior for borrowing, thus optimizing gains in efficiency (e.g. increasing power or reducing the sample size) in future trials.
Asunto(s)
Degeneración Macular , Modelos Estadísticos , Humanos , Teorema de Bayes , Estudios Transversales , Tamaño de la Muestra , Degeneración Macular/tratamiento farmacológico , Proyectos de Investigación , Simulación por ComputadorRESUMEN
Pharmacometrics and the application of population pharmacokinetic (PK) modeling play a crucial role in clinical pharmacology. These methods, which describe data with well-defined equations and estimate physiologically interpretable parameters, have not changed substantially during the past decades. Although the methods have proven their usefulness, they are often resource intensive and require a high level of expertise. We investigated whether a method based on artificial neural networks (ANNs) may provide an alternative approach for the prediction of concentration-time curve to supplement the gold standard methods. In this work, we used simulated data to overcome the requirement for a large clinical training data set, implemented a pharmacologically reasonable network architecture to improve extrapolation to different dosing schemes, and used transfer learning to quickly adapt the predictions to new patient groups. We demonstrate that ANNs are able to learn the shape of concentration-time curves and make individual predictions based on a short sequence of PK measurements. Furthermore, an ANN trained on simulated data was applied to real clinical data and was demonstrated to extrapolate to different dosing schemes. We also adapted the ANN trained on simulated healthy subjects to simulated hepatic impaired patients through transfer learning. In summary, we demonstrate how ANNs could be leveraged in a PK workflow to efficiently make individual concentration-time predictions, and we discuss the current limitations and advantages of such an ANN-based method.
Asunto(s)
Redes Neurales de la Computación , Humanos , Flujo de TrabajoRESUMEN
In the last few years, machine learning (ML) and artificial intelligence have seen a new wave of publicity fueled by the huge and ever-increasing amount of data and computational power as well as the discovery of improved learning algorithms. However, the idea of a computer learning some abstract concept from data and applying them to yet unseen situations is not new and has been around at least since the 1950s. Many of these basic principles are very familiar to the pharmacometrics and clinical pharmacology community. In this paper, we want to introduce the foundational ideas of ML to this community such that readers obtain the essential tools they need to understand publications on the topic. Although we will not go into the very details and theoretical background, we aim to point readers to relevant literature and put applications of ML in molecular biology as well as the fields of pharmacometrics and clinical pharmacology into perspective.
Asunto(s)
Aprendizaje Automático/tendencias , Modelos Teóricos , Farmacología Clínica/tendencias , Análisis por Conglomerados , Humanos , Farmacología Clínica/estadística & datos numéricosRESUMEN
Mechanistic models of biomolecular processes are established research tools that enable to quantitatively investigate dynamic features of biological processes such as signal transduction cascades. Often, these models aim at describing a large number of states, for instance concentrations of proteins and small molecules, as well as their interactions. Each modeled interaction increases the number of potentially unknown parameters like reaction rate constants or initial amount of proteins. In order to calibrate these mechanistic models, the unknown model parameters have to be estimated based on experimental data. The complexity of parameter estimation raises several computational challenges that can be tackled within the Data2Dynamics modeling environment. The environment is a well-tested, high-performance software package that is tailored to the modeling of biological processes with ordinary differential equation models and using experimental biomolecular data.In this chapter, we introduce and provide "recipes" for the most frequent analyses and modeling tasks in the Data2Dynamics modeling environment. The presented protocols comprise model building, data handling, parameter estimation, calculation of confidence intervals, model selection and reduction, deriving prediction uncertainties, and designing informative novel experiments.
Asunto(s)
Biología Computacional/métodos , Modelos Biológicos , Transducción de Señal/genética , Biología de Sistemas/métodos , Algoritmos , Simulación por Computador , Programas InformáticosRESUMEN
Recent advances in machine learning (ML) have led to enthusiasm about its use throughout the biopharmaceutical industry. The ML methods can be applied to a wide range of problems and have the potential to revolutionize aspects of drug development. The incorporation of ML in modeling and simulation (M&S) has been eagerly anticipated, and in this perspective, we highlight examples in which ML and M&S approaches can be integrated as complementary parts of a clinical pharmacology workflow.
Asunto(s)
Aprendizaje Profundo , Farmacología Clínica/métodos , Macrodatos , Simulación por Computador , Humanos , Modelos TeóricosRESUMEN
Extracellular growth factors signal to transcription factors via a limited number of cytoplasmic kinase cascades. It remains unclear how such cascades encode ligand identities and concentrations. In this paper, we use live-cell imaging and statistical modeling to study FOXO3, a transcription factor regulating diverse aspects of cellular physiology that is under combinatorial control. We show that FOXO3 nuclear-to-cytosolic translocation has two temporally distinct phases varying in magnitude with growth factor identity and cell type. These phases comprise synchronous translocation soon after ligand addition followed by an extended back-and-forth shuttling; this shuttling is pulsatile and does not have a characteristic frequency, unlike a simple oscillator. Early and late dynamics are differentially regulated by Akt and ERK and have low mutual information, potentially allowing the two phases to encode different information. In cancer cells in which ERK and Akt are dysregulated by oncogenic mutation, the diversity of states is lower.
Asunto(s)
Proteína Forkhead Box O3/metabolismo , Proteína Forkhead Box O3/fisiología , Línea Celular , Citosol/metabolismo , Factores de Transcripción Forkhead/metabolismo , Humanos , Péptidos y Proteínas de Señalización Intercelular/metabolismo , Sistema de Señalización de MAP Quinasas/fisiología , Células MCF-7 , Fosforilación , Transporte de Proteínas , Proteínas Proto-Oncogénicas c-akt/metabolismo , Transducción de Señal/fisiologíaRESUMEN
Upon stimulation of cells with transforming growth factor ß (TGF-ß), Smad proteins form trimeric complexes and activate a broad spectrum of target genes. It remains unresolved which of the possible Smad complexes are formed in cellular contexts and how these contribute to gene expression. By combining quantitative mass spectrometry with a computational selection strategy, we predict and provide experimental evidence for the three most relevant Smad complexes in the mouse hepatoma cell line Hepa1-6. Utilizing dynamic pathway modeling, we specify the contribution of each Smad complex to the expression of representative Smad target genes, and show that these contributions are conserved in human hepatoma cell lines and primary hepatocytes. We predict, based on gene expression data of patient samples, increased amounts of Smad2/3/4 proteins and Smad2 phosphorylation as hallmarks of hepatocellular carcinoma and experimentally verify this prediction. Our findings demonstrate that modeling approaches can disentangle the complexity of transcription factor complex formation and its impact on gene expression.
Asunto(s)
Proteínas Smad/genética , Anciano , Animales , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Línea Celular Tumoral , Proteínas de Unión al ADN/genética , Femenino , Células Hep G2 , Hepatocitos/metabolismo , Humanos , Hígado/metabolismo , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Masculino , Espectrometría de Masas/métodos , Ratones , Ratones Endogámicos C57BL , Persona de Mediana Edad , Fosforilación , Transducción de Señal , Proteínas Smad/metabolismo , Transactivadores/genética , Transcripción Genética , Factor de Crecimiento Transformador beta/metabolismoRESUMEN
In systems biology, one of the major tasks is to tailor model complexity to information content of the data. A useful model should describe the data and produce well-determined parameter estimates and predictions. Too small of a model will not be able to describe the data whereas a model which is too large tends to overfit measurement errors and does not provide precise predictions. Typically, the model is modified and tuned to fit the data, which often results in an oversized model. To restore the balance between model complexity and available measurements, either new data has to be gathered or the model has to be reduced. In this manuscript, we present a data-based method for reducing non-linear models. The profile likelihood is utilised to assess parameter identifiability and designate likely candidates for reduction. Parameter dependencies are analysed along profiles, providing context-dependent suggestions for the type of reduction. We discriminate four distinct scenarios, each associated with a specific model reduction strategy. Iterating the presented procedure eventually results in an identifiable model, which is capable of generating precise and testable predictions. Source code for all toy examples is provided within the freely available, open-source modelling environment Data2Dynamics based on MATLAB available at http://www.data2dynamics.org/, as well as the R packages dMod/cOde available at https://github.com/dkaschek/. Moreover, the concept is generally applicable and can readily be used with any software capable of calculating the profile likelihood.
Asunto(s)
Simulación por Computador , Modelos Biológicos , Programas Informáticos , Biología de Sistemas/métodos , Algoritmos , Dinámicas no LinealesRESUMEN
BACKGROUND: Accurate estimation of parameters of biochemical models is required to characterize the dynamics of molecular processes. This problem is intimately linked to identifying the most informative experiments for accomplishing such tasks. While significant progress has been made, effective experimental strategies for parameter identification and for distinguishing among alternative network topologies remain unclear. We approached these questions in an unbiased manner using a unique community-based approach in the context of the DREAM initiative (Dialogue for Reverse Engineering Assessment of Methods). We created an in silico test framework under which participants could probe a network with hidden parameters by requesting a range of experimental assays; results of these experiments were simulated according to a model of network dynamics only partially revealed to participants. RESULTS: We proposed two challenges; in the first, participants were given the topology and underlying biochemical structure of a 9-gene regulatory network and were asked to determine its parameter values. In the second challenge, participants were given an incomplete topology with 11 genes and asked to find three missing links in the model. In both challenges, a budget was provided to buy experimental data generated in silico with the model and mimicking the features of different common experimental techniques, such as microarrays and fluorescence microscopy. Data could be bought at any stage, allowing participants to implement an iterative loop of experiments and computation. CONCLUSIONS: A total of 19 teams participated in this competition. The results suggest that the combination of state-of-the-art parameter estimation and a varied set of experimental methods using a few datasets, mostly fluorescence imaging data, can accurately determine parameters of biochemical models of gene regulation. However, the task is considerably more difficult if the gene network topology is not completely defined, as in challenge 2. Importantly, we found that aggregating independent parameter predictions and network topology across submissions creates a solution that can be better than the one from the best-performing submission.
Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , Simulación por Computador , Cinética , Modelos Genéticos , Factores de TiempoRESUMEN
Systems biology aims for building quantitative models to address unresolved issues in molecular biology. In order to describe the behavior of biological cells adequately, gene regulatory networks (GRNs) are intensively investigated. As the validity of models built for GRNs depends crucially on the kinetic rates, various methods have been developed to estimate these parameters from experimental data. For this purpose, it is favorable to choose the experimental conditions yielding maximal information. However, existing experimental design principles often rely on unfulfilled mathematical assumptions or become computationally demanding with growing model complexity. To solve this problem, we combined advanced methods for parameter and uncertainty estimation with experimental design considerations. As a showcase, we optimized three simulated GRNs in one of the challenges from the Dialogue for Reverse Engineering Assessment and Methods (DREAM). This article presents our approach, which was awarded the best performing procedure at the DREAM6 Estimation of Model Parameters challenge. For fast and reliable parameter estimation, local deterministic optimization of the likelihood was applied. We analyzed identifiability and precision of the estimates by calculating the profile likelihood. Furthermore, the profiles provided a way to uncover a selection of most informative experiments, from which the optimal one was chosen using additional criteria at every step of the design process. In conclusion, we provide a strategy for optimal experimental design and show its successful application on three highly nonlinear dynamic models. Although presented in the context of the GRNs to be inferred for the DREAM6 challenge, the approach is generic and applicable to most types of quantitative models in systems biology and other disciplines.