RESUMO
Amortized simulation-based neural posterior estimation provides a novel machine learning based approach for solving parameter estimation problems. It has been shown to be computationally efficient and able to handle complex models and data sets. Yet, the available approach cannot handle the in experimental studies ubiquitous case of missing data, and might provide incorrect posterior estimates. In this work, we discuss various ways of encoding missing data and integrate them into the training and inference process. We implement the approaches in the BayesFlow methodology, an amortized estimation framework based on invertible neural networks, and evaluate their performance on multiple test problems. We find that an approach in which the data vector is augmented with binary indicators of presence or absence of values performs the most robustly. Indeed, it improved the performance also for the simpler problem of data sets with variable length. Accordingly, we demonstrate that amortized simulation-based inference approaches are applicable even with missing data, and we provide a guideline for their handling, which is relevant for a broad spectrum of applications.
Assuntos
Biologia Computacional , Simulação por Computador , Redes Neurais de Computação , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , Teorema de Bayes , Algoritmos , Interpretação Estatística de DadosRESUMO
Approximate Bayesian Computation (ABC) is a widely applicable and popular approach to estimating unknown parameters of mechanistic models. As ABC analyses are computationally expensive, parallelization on high-performance infrastructure is often necessary. However, the existing parallelization strategies leave computing resources unused at times and thus do not optimally leverage them yet. We present look-ahead scheduling, a wall-time minimizing parallelization strategy for ABC Sequential Monte Carlo algorithms, which avoids idle times of computing units by preemptive sampling of subsequent generations. This allows to utilize all available resources. The strategy can be integrated with e.g. adaptive distance function and summary statistic selection schemes, which is essential in practice. Our key contribution is the theoretical assessment of the strategy of preemptive sampling and the proof of unbiasedness. Complementary, we provide an implementation and evaluate the strategy on different problems and numbers of parallel cores, showing speed-ups of typically 10-20% and up to 50% compared to the best established approach, with some variability. Thus, the proposed strategy allows to improve the cost and run-time efficiency of ABC methods on high-performance infrastructure.
Assuntos
Algoritmos , Vírion , Teorema de Bayes , Método de Monte CarloRESUMO
MOTIVATION: Biological tissues are dynamic and highly organized. Multi-scale models are helpful tools to analyse and understand the processes determining tissue dynamics. These models usually depend on parameters that need to be inferred from experimental data to achieve a quantitative understanding, to predict the response to perturbations, and to evaluate competing hypotheses. However, even advanced inference approaches such as approximate Bayesian computation (ABC) are difficult to apply due to the computational complexity of the simulation of multi-scale models. Thus, there is a need for a scalable pipeline for modeling, simulating, and parameterizing multi-scale models of multi-cellular processes. RESULTS: Here, we present FitMultiCell, a computationally efficient and user-friendly open-source pipeline that can handle the full workflow of modeling, simulating, and parameterizing for multi-scale models of multi-cellular processes. The pipeline is modular and integrates the modeling and simulation tool Morpheus and the statistical inference tool pyABC. The easy integration of high-performance infrastructure allows to scale to computationally expensive problems. The introduction of a novel standard for the formulation of parameter inference problems for multi-scale models additionally ensures reproducibility and reusability. By applying the pipeline to multiple biological problems, we demonstrate its broad applicability, which will benefit in particular image-based systems biology. AVAILABILITY AND IMPLEMENTATION: FitMultiCell is available open-source at https://gitlab.com/fitmulticell/fit.
Assuntos
Modelos Biológicos , Biologia de Sistemas , Teorema de Bayes , Reprodutibilidade dos Testes , Simulação por Computador , Fluxo de TrabalhoRESUMO
SUMMARY: Mechanistic models are important tools to describe and understand biological processes. However, they typically rely on unknown parameters, the estimation of which can be challenging for large and complex systems. pyPESTO is a modular framework for systematic parameter estimation, with scalable algorithms for optimization and uncertainty quantification. While tailored to ordinary differential equation problems, pyPESTO is broadly applicable to black-box parameter estimation problems. Besides own implementations, it provides a unified interface to various popular simulation and inference methods. AVAILABILITY AND IMPLEMENTATION: pyPESTO is implemented in Python, open-source under a 3-Clause BSD license. Code and documentation are available on GitHub (https://github.com/icb-dcm/pypesto).
Assuntos
Algoritmos , Software , Simulação por Computador , Incerteza , Documentação , Modelos BiológicosRESUMO
BACKGROUND: Population-based serological studies allow to estimate prevalence of SARS-CoV-2 infections despite a substantial number of mild or asymptomatic disease courses. This became even more relevant for decision making after vaccination started. The KoCo19 cohort tracks the pandemic progress in the Munich general population for over two years, setting it apart in Europe. METHODS: Recruitment occurred during the initial pandemic wave, including 5313 participants above 13 years from private households in Munich. Four follow-ups were held at crucial times of the pandemic, with response rates of at least 70%. Participants filled questionnaires on socio-demographics and potential risk factors of infection. From Follow-up 2, information on SARS-CoV-2 vaccination was added. SARS-CoV-2 antibody status was measured using the Roche Elecsys® Anti-SARS-CoV-2 anti-N assay (indicating previous infection) and the Roche Elecsys® Anti-SARS-CoV-2 anti-S assay (indicating previous infection and/or vaccination). This allowed us to distinguish between sources of acquired antibodies. RESULTS: The SARS-CoV-2 estimated cumulative sero-prevalence increased from 1.6% (1.1-2.1%) in May 2020 to 14.5% (12.7-16.2%) in November 2021. Underreporting with respect to official numbers fluctuated with testing policies and capacities, becoming a factor of more than two during the second half of 2021. Simultaneously, the vaccination campaign against the SARS-CoV-2 virus increased the percentage of the Munich population having antibodies, with 86.8% (85.5-87.9%) having developed anti-S and/or anti-N in November 2021. Incidence rates for infections after (BTI) and without previous vaccination (INS) differed (ratio INS/BTI of 2.1, 0.7-3.6). However, the prevalence of infections was higher in the non-vaccinated population than in the vaccinated one. Considering the whole follow-up time, being born outside Germany, working in a high-risk job and living area per inhabitant were identified as risk factors for infection, while other socio-demographic and health-related variables were not. Although we obtained significant within-household clustering of SARS-CoV-2 cases, no further geospatial clustering was found. CONCLUSIONS: Vaccination increased the coverage of the Munich population presenting SARS-CoV-2 antibodies, but breakthrough infections contribute to community spread. As underreporting stays relevant over time, infections can go undetected, so non-pharmaceutical measures are crucial, particularly for highly contagious strains like Omicron.
Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , COVID-19/prevenção & controle , SARS-CoV-2 , Vírus Delta da Hepatite , Vacinas contra COVID-19 , Pandemias , Anticorpos AntiviraisRESUMO
Calibrating model parameters on heterogeneous data can be challenging and inefficient. This holds especially for likelihood-free methods such as approximate Bayesian computation (ABC), which rely on the comparison of relevant features in simulated and observed data and are popular for otherwise intractable problems. To address this problem, methods have been developed to scale-normalize data, and to derive informative low-dimensional summary statistics using inverse regression models of parameters on data. However, while approaches only correcting for scale can be inefficient on partly uninformative data, the use of summary statistics can lead to information loss and relies on the accuracy of employed methods. In this work, we first show that the combination of adaptive scale normalization with regression-based summary statistics is advantageous on heterogeneous parameter scales. Second, we present an approach employing regression models not to transform data, but to inform sensitivity weights quantifying data informativeness. Third, we discuss problems for regression models under non-identifiability, and present a solution using target augmentation. We demonstrate improved accuracy and efficiency of the presented approach on various problems, in particular robustness and wide applicability of the sensitivity weights. Our findings demonstrate the potential of the adaptive approach. The developed algorithms have been made available in the open-source Python toolbox pyABC.
Assuntos
Algoritmos , Simulação por Computador , Teorema de BayesRESUMO
Mathematical models have been widely used during the ongoing SARS-CoV-2 pandemic for data interpretation, forecasting, and policy making. However, most models are based on officially reported case numbers, which depend on test availability and test strategies. The time dependence of these factors renders interpretation difficult and might even result in estimation biases. Here, we present a computational modelling framework that allows for the integration of reported case numbers with seroprevalence estimates obtained from representative population cohorts. To account for the time dependence of infection and testing rates, we embed flexible splines in an epidemiological model. The parameters of these splines are estimated, along with the other parameters, from the available data using a Bayesian approach. The application of this approach to the official case numbers reported for Munich (Germany) and the seroprevalence reported by the prospective COVID-19 Cohort Munich (KoCo19) provides first estimates for the time dependence of the under-reporting factor. Furthermore, we estimate how the effectiveness of non-pharmaceutical interventions and of the testing strategy evolves over time. Overall, our results show that the integration of temporally highly resolved and representative data is beneficial for accurate epidemiological analyses.
Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , SARS-CoV-2 , Estudos Soroepidemiológicos , Teorema de Bayes , Modelos TeóricosRESUMO
Countries around the world implement nonpharmaceutical interventions (NPIs) to mitigate the spread of COVID-19. Design of efficient NPIs requires identification of the structure of the disease transmission network. We here identify the key parameters of the COVID-19 transmission network for time periods before, during, and after the application of strict NPIs for the first wave of COVID-19 infections in Germany combining Bayesian parameter inference with an agent-based epidemiological model. We assume a Watts-Strogatz small-world network which allows to distinguish contacts within clustered cliques and unclustered, random contacts in the population, which have been shown to be crucial in sustaining the epidemic. In contrast to other works, which use coarse-grained network structures from anonymized data, like cell phone data, we consider the contacts of individual agents explicitly. We show that NPIs drastically reduced random contacts in the transmission network, increased network clustering, and resulted in a previously unappreciated transition from an exponential to a constant regime of new cases. In this regime, the disease spreads like a wave with a finite wave speed that depends on the number of contacts in a nonlinear fashion, which we can predict by mean field theory.
Assuntos
COVID-19 , Análise por Conglomerados , Epidemias , HumanosRESUMO
A number of seroassays are available for SARS-CoV-2 testing; yet, head-to-head evaluations of different testing principles are limited, especially using raw values rather than categorical data. In addition, identifying correlates of protection is of utmost importance, and comparisons of available testing systems with functional assays, such as direct viral neutralisation, are needed.We analysed 6658 samples consisting of true-positives (n=193), true-negatives (n=1091), and specimens of unknown status (n=5374). For primary testing, we used Euroimmun-Anti-SARS-CoV-2-ELISA-IgA/IgG and Roche-Elecsys-Anti-SARS-CoV-2. Subsequently virus-neutralisation, GeneScriptcPass, VIRAMED-SARS-CoV-2-ViraChip, and Mikrogen-recomLine-SARS-CoV-2-IgG were applied for confirmatory testing. Statistical modelling generated optimised assay cut-off thresholds. Sensitivity of Euroimmun-anti-S1-IgA was 64.8%, specificity 93.3% (manufacturer's cut-off); for Euroimmun-anti-S1-IgG, sensitivity was 77.2/79.8% (manufacturer's/optimised cut-offs), specificity 98.0/97.8%; Roche-anti-N sensitivity was 85.5/88.6%, specificity 99.8/99.7%. In true-positives, mean and median Euroimmun-anti-S1-IgA and -IgG titres decreased 30/90 days after RT-PCR-positivity, Roche-anti-N titres decreased significantly later. Virus-neutralisation was 80.6% sensitive, 100.0% specific (≥1:5 dilution). Neutralisation surrogate tests (GeneScriptcPass, Mikrogen-recomLine-RBD) were >94.9% sensitive and >98.1% specific. Optimised cut-offs improved test performances of several tests. Confirmatory testing with virus-neutralisation might be complemented with GeneScriptcPassTM or recomLine-RBD for certain applications. Head-to-head comparisons given here aim to contribute to the refinement of testing strategies for individual and public health use.
Assuntos
Teste Sorológico para COVID-19/métodos , COVID-19/diagnóstico , Testes de Neutralização/métodos , SARS-CoV-2/imunologia , Teste de Ácido Nucleico para COVID-19 , Estudos de Coortes , HumanosRESUMO
BACKGROUND: In the 2nd year of the COVID-19 pandemic, knowledge about the dynamics of the infection in the general population is still limited. Such information is essential for health planners, as many of those infected show no or only mild symptoms and thus, escape the surveillance system. We therefore aimed to describe the course of the pandemic in the Munich general population living in private households from April 2020 to January 2021. METHODS: The KoCo19 baseline study took place from April to June 2020 including 5313 participants (age 14 years and above). From November 2020 to January 2021, we could again measure SARS-CoV-2 antibody status in 4433 of the baseline participants (response 83%). Participants were offered a self-sampling kit to take a capillary blood sample (dry blood spot; DBS). Blood was analysed using the Elecsys® Anti-SARS-CoV-2 assay (Roche). Questionnaire information on socio-demographics and potential risk factors assessed at baseline was available for all participants. In addition, follow-up information on health-risk taking behaviour and number of personal contacts outside the household (N = 2768) as well as leisure time activities (N = 1263) were collected in summer 2020. RESULTS: Weighted and adjusted (for specificity and sensitivity) SARS-CoV-2 sero-prevalence at follow-up was 3.6% (95% CI 2.9-4.3%) as compared to 1.8% (95% CI 1.3-3.4%) at baseline. 91% of those tested positive at baseline were also antibody-positive at follow-up. While sero-prevalence increased from early November 2020 to January 2021, no indication of geospatial clustering across the city of Munich was found, although cases clustered within households. Taking baseline result and time to follow-up into account, men and participants in the age group 20-34 years were at the highest risk of sero-positivity. In the sensitivity analyses, differences in health-risk taking behaviour, number of personal contacts and leisure time activities partly explained these differences. CONCLUSION: The number of citizens in Munich with SARS-CoV-2 antibodies was still below 5% during the 2nd wave of the pandemic. Antibodies remained present in the majority of SARS-CoV-2 sero-positive baseline participants. Besides age and sex, potentially confounded by differences in behaviour, no major risk factors could be identified. Non-pharmaceutical public health measures are thus still important.
Assuntos
COVID-19 , Pandemias , Seguimentos , Alemanha/epidemiologia , Humanos , Recém-Nascido , Masculino , SARS-CoV-2RESUMO
The hepatitis C virus (HCV) is capable of spreading within a host by two different transmission modes: cell-free and cell-to-cell. However, the contribution of each of these transmission mechanisms to HCV spread is unknown. To dissect the contribution of these different transmission modes to HCV spread, we measured HCV lifecycle kinetics and used an in vitro spread assay to monitor HCV spread kinetics after a low multiplicity of infection in the absence and presence of a neutralizing antibody that blocks cell-free spread. By analyzing these data with a spatially explicit mathematical model that describes viral spread on a single-cell level, we quantified the contribution of cell-free, and cell-to-cell spread to the overall infection dynamics and show that both transmission modes act synergistically to enhance the spread of infection. Thus, the simultaneous occurrence of both transmission modes represents an advantage for HCV that may contribute to viral persistence. Notably, the relative contribution of each viral transmission mode appeared to vary dependent on different experimental conditions and suggests that viral spread is optimized according to the environment. Together, our analyses provide insight into the spread dynamics of HCV and reveal how different transmission modes impact each other.
Assuntos
Hepacivirus/fisiologia , Hepatite C/fisiopatologia , Hepatite C/virologia , Interações entre Hospedeiro e Microrganismos , Linhagem Celular Tumoral , Humanos , Cinética , Modelos Teóricos , Internalização do VírusRESUMO
SUMMARY: Ordinary differential equation models facilitate the understanding of cellular signal transduction and other biological processes. However, for large and comprehensive models, the computational cost of simulating or calibrating can be limiting. AMICI is a modular toolbox implemented in C++/Python/MATLAB that provides efficient simulation and sensitivity analysis routines tailored for scalable, gradient-based parameter estimation and uncertainty quantification. AVAILABILITYAND IMPLEMENTATION: AMICI is published under the permissive BSD-3-Clause license with source code publicly available on https://github.com/AMICI-dev/AMICI. Citeable releases are archived on Zenodo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Given the large number of mild or asymptomatic SARS-CoV-2 cases, only population-based studies can provide reliable estimates of the magnitude of the pandemic. We therefore aimed to assess the sero-prevalence of SARS-CoV-2 in the Munich general population after the first wave of the pandemic. For this purpose, we drew a representative sample of 2994 private households and invited household members 14 years and older to complete questionnaires and to provide blood samples. SARS-CoV-2 seropositivity was defined as Roche N pan-Ig ≥ 0.4218. We adjusted the prevalence for the sampling design, sensitivity, and specificity. We investigated risk factors for SARS-CoV-2 seropositivity and geospatial transmission patterns by generalized linear mixed models and permutation tests. Seropositivity for SARS-CoV-2-specific antibodies was 1.82% (95% confidence interval (CI) 1.28-2.37%) as compared to 0.46% PCR-positive cases officially registered in Munich. Loss of the sense of smell or taste was associated with seropositivity (odds ratio (OR) 47.4; 95% CI 7.2-307.0) and infections clustered within households. By this first population-based study on SARS-CoV-2 prevalence in a large German municipality not affected by a superspreading event, we could show that at least one in four cases in private households was reported and known to the health authorities. These results will help authorities to estimate the true burden of disease in the population and to take evidence-based decisions on public health measures.
Assuntos
COVID-19 , Infecções por Coronavirus , Humanos , Prevalência , Fatores de Risco , SARS-CoV-2RESUMO
Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been-so far-no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated. We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and currently 20 example parameter estimation problems based on recent studies.
Assuntos
Linguagens de Programação , Biologia de Sistemas/métodos , Algoritmos , Bases de Dados Factuais , Modelos Biológicos , Modelos Estatísticos , Reprodutibilidade dos TestesRESUMO
Ordinary differential equation (ODE) models are a key tool to understand complex mechanisms in systems biology. These models are studied using various approaches, including stability and bifurcation analysis, but most frequently by numerical simulations. The number of required simulations is often large, e.g., when unknown parameters need to be inferred. This renders efficient and reliable numerical integration methods essential. However, these methods depend on various hyperparameters, which strongly impact the ODE solution. Despite this, and although hundreds of published ODE models are freely available in public databases, a thorough study that quantifies the impact of hyperparameters on the ODE solver in terms of accuracy and computation time is still missing. In this manuscript, we investigate which choices of algorithms and hyperparameters are generally favorable when dealing with ODE models arising from biological processes. To ensure a representative evaluation, we considered 142 published models. Our study provides evidence that most ODEs in computational biology are stiff, and we give guidelines for the choice of algorithms and hyperparameters. We anticipate that our results will help researchers in systems biology to choose appropriate numerical methods when dealing with ODE models.
RESUMO
MOTIVATION: Approximate Bayesian computation (ABC) is an increasingly popular method for likelihood-free parameter inference in systems biology and other fields of research, as it allows analyzing complex stochastic models. However, the introduced approximation error is often not clear. It has been shown that ABC actually gives exact inference under the implicit assumption of a measurement noise model. Noise being common in biological systems, it is intriguing to exploit this insight. But this is difficult in practice, as ABC is in general highly computationally demanding. Thus, the question we want to answer here is how to efficiently account for measurement noise in ABC. RESULTS: We illustrate exemplarily how ABC yields erroneous parameter estimates when neglecting measurement noise. Then, we discuss practical ways of correctly including the measurement noise in the analysis. We present an efficient adaptive sequential importance sampling-based algorithm applicable to various model types and noise models. We test and compare it on several models, including ordinary and stochastic differential equations, Markov jump processes and stochastically interacting agents, and noise models including normal, Laplace and Poisson noise. We conclude that the proposed algorithm could improve the accuracy of parameter estimates for a broad spectrum of applications. AVAILABILITY AND IMPLEMENTATION: The developed algorithms are made publicly available as part of the open-source python toolbox pyABC (https://github.com/icb-dcm/pyabc). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Biologia de Sistemas , Teorema de Bayes , Cadeias de Markov , ProbabilidadeRESUMO
MOTIVATION: Mechanistic models of biochemical reaction networks facilitate the quantitative understanding of biological processes and the integration of heterogeneous datasets. However, some biological processes require the consideration of comprehensive reaction networks and therefore large-scale models. Parameter estimation for such models poses great challenges, in particular when the data are on a relative scale. RESULTS: Here, we propose a novel hierarchical approach combining (i) the efficient analytic evaluation of optimal scaling, offset and error model parameters with (ii) the scalable evaluation of objective function gradients using adjoint sensitivity analysis. We evaluate the properties of the methods by parameterizing a pan-cancer ordinary differential equation model (>1000 state variables, >4000 parameters) using relative protein, phosphoprotein and viability measurements. The hierarchical formulation improves optimizer performance considerably. Furthermore, we show that this approach allows estimating error model parameters with negligible computational overhead when no experimental estimates are available, providing an unbiased way to weight heterogeneous data. Overall, our hierarchical formulation is applicable to a wide range of models, and allows for the efficient parameterization of large-scale models based on heterogeneous relative measurements. AVAILABILITY AND IMPLEMENTATION: Supplementary code and data are available online at http://doi.org/10.5281/zenodo.3254429 and http://doi.org/10.5281/zenodo.3254441. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.