RESUMO
We propose a multivariate GARCH model for non-stationary health time series by modifying the observation-level variance of the standard state space model. The proposed model provides an intuitive and novel way of dealing with heteroskedastic data using the conditional nature of state-space models. We follow the Bayesian paradigm to perform the inference procedure. In particular, we use Markov chain Monte Carlo methods to obtain samples from the resultant posterior distribution. We use the forward filtering backward sampling algorithm to efficiently obtain samples from the posterior distribution of the latent state. The proposed model also handles missing data in a fully Bayesian fashion. We validate our model on synthetic data and analyze a data set obtained from an intensive care unit in a Montreal hospital and the MIMIC dataset. We further show that our proposed models offer better performance, in terms of WAIC than standard state space models. The proposed model provides a new way to model multivariate heteroskedastic non-stationary time series data. Model comparison can then be easily performed using the WAIC.
Assuntos
Teorema de Bayes , Cuidados Críticos , Unidades de Terapia Intensiva , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Humanos , Análise Multivariada , Cuidados Críticos/estatística & dados numéricos , Cuidados Críticos/métodos , Algoritmos , Simulação por Computador , QuebequeRESUMO
DNA methylation plays an essential role in regulating gene activity, modulating disease risk, and determining treatment response. We can obtain insight into methylation patterns at a single-nucleotide level via next-generation sequencing technologies. However, complex features inherent in the data obtained via these technologies pose challenges beyond the typical big data problems. Identifying differentially methylated cytosines (dmc) or regions is one such challenge. We have developed DMCFB, an efficient dmc identification method based on Bayesian functional regression, to tackle these challenges. Using simulations, we establish that DMCFB outperforms current methods and results in better smoothing and efficient imputation. We analyzed a dataset of patients with acute promyelocytic leukemia and control samples. With DMCFB, we discovered many new dmcs and, more importantly, exhibited enhanced consistency of differential methylation within islands and their adjacent shores. Additionally, we detected differential methylation at more of the binding sites of the fused gene involved in this cancer.
Assuntos
Teorema de Bayes , Metilação de DNA , Epigênese Genética , Metilação de DNA/genética , Humanos , Leucemia Promielocítica Aguda/genéticaRESUMO
Despite a growing body of literature in the area of recruitment modeling for multicenter studies, in practice, statistical models to predict enrollments are rarely used and when they are, they often rely on unrealistic assumptions. The time-dependent Poisson-Gamma model (tPG) is a recently developed flexible methodology which allows analysts to predict recruitments in an ongoing multicenter trial, and its performance has been validated on data from a cohort study. In this article, we illustrate and further validate the tPG model on recruitment data from randomized controlled trials. Additionally, in the appendix, we provide a practical and easy to follow guide to its implementation via the tPG R package. To validate the model, we show the predictive performance of the proposed methodology in forecasting the recruitment process of two HIV vaccine trials conducted by the HIV Vaccine Trials Network in multiple Sub-Saharan countries.
Assuntos
Vacinas contra a AIDS , Infecções por HIV , Modelos Estatísticos , Seleção de Pacientes , Humanos , Vacinas contra a AIDS/uso terapêutico , Distribuição de Poisson , Estudos Multicêntricos como Assunto/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Fatores de Tempo , Previsões , África SubsaarianaRESUMO
Many cohort studies in survival analysis have imbedded in them subcohorts consisting of incident cases and prevalent cases. Instead of analysing the data from the incident and prevalent cohorts alone, there are surely advantages to combining the data from these two subcohorts. In this paper, we discuss a survival function nonparametric maximum likelihood estimator (NPMLE) using both length-biased right-censored prevalent cohort data and right-censored incident cohort data. We establish the asymptotic properties of the survival function NPMLE and utilize the NPMLE to estimate the distribution for time spent in a Montreal area hospital.
RESUMO
BACKGROUND: During the first year of the COVID-19 pandemic, the proportion of reported cases of COVID-19 among Canadians was under 6%. Although high vaccine coverage was achieved in Canada by fall 2021, the Omicron variant caused unprecedented numbers of infections, overwhelming testing capacity and making it difficult to quantify the trajectory of population immunity. METHODS: Using a time-series approach and data from more than 900 000 samples collected by 7 research studies collaborating with the COVID-19 Immunity Task Force (CITF), we estimated trends in SARS-CoV-2 seroprevalence owing to infection and vaccination for the Canadian population over 3 intervals: prevaccination (March to November 2020), vaccine roll-out (December 2020 to November 2021), and the arrival of the Omicron variant (December 2021 to March 2023). We also estimated seroprevalence by geographical region and age. RESULTS: By November 2021, 9.0% (95% credible interval [CrI] 7.3%-11%) of people in Canada had humoral immunity to SARS-CoV-2 from an infection. Seroprevalence increased rapidly after the arrival of the Omicron variant - by Mar. 15, 2023, 76% (95% CrI 74%-79%) of the population had detectable antibodies from infections. The rapid rise in infection-induced antibodies occurred across Canada and was most pronounced in younger age groups and in the Western provinces: Manitoba, Saskatchewan, Alberta and British Columbia. INTERPRETATION: Data up to March 2023 indicate that most people in Canada had acquired antibodies against SARS-CoV-2 through natural infection and vaccination. However, given variations in population seropositivity by age and geography, the potential for waning antibody levels, and new variants that may escape immunity, public health policy and clinical decisions should be tailored to local patterns of population immunity.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , Pandemias , Estudos Soroepidemiológicos , Alberta , Anticorpos AntiviraisRESUMO
Forecasting recruitments is a key component of the monitoring phase of multicenter studies. One of the most popular techniques in this field is the Poisson-Gamma recruitment model, a Bayesian technique built on a doubly stochastic Poisson process. This approach is based on the modeling of enrollments as a Poisson process where the recruitment rates are assumed to be constant over time and to follow a common Gamma prior distribution. However, the constant-rate assumption is a restrictive limitation that is rarely appropriate for applications in real studies. In this paper, we illustrate a flexible generalization of this methodology which allows the enrollment rates to vary over time by modeling them through B-splines. We show the suitability of this approach for a wide range of recruitment behaviors in a simulation study and by estimating the recruitment progression of the Canadian Co-infection Cohort.
Assuntos
Modelos Estatísticos , Humanos , Teorema de Bayes , Distribuição de Poisson , Canadá , Simulação por ComputadorRESUMO
In this work, we examine recently developed methods for Bayesian inference of optimal dynamic treatment regimes (DTRs). DTRs are a set of treatment decision rules aimed at tailoring patient care to patient-specific characteristics, thereby falling within the realm of precision medicine. In this field, researchers seek to tailor therapy with the intention of improving health outcomes; therefore, they are most interested in identifying optimal DTRs. Recent work has developed Bayesian methods for identifying optimal DTRs in a family indexed by ψ via Bayesian dynamic marginal structural models (MSMs) (Rodriguez Duque D, Stephens DA, Moodie EEM, Klein MB. Semiparametric Bayesian inference for dynamic treatment regimes via dynamic regime marginal structural models. Biostatistics; 2022. (In Press)); we review the proposed estimation procedure and illustrate its use via the new BayesDTR R package. Although methods in Rodriguez Duque D, Stephens DA, Moodie EEM, Klein MB. (Semiparametric Bayesian inference for dynamic treatment regimes via dynamic regime marginal structural models. Biostatistics; 2022. (In Press)) can estimate optimal DTRs well, they may lead to biased estimators when the model for the expected outcome if everyone in a population were to follow a given treatment strategy, known as a value function, is misspecified or when a grid search for the optimum is employed. We describe recent work that uses a Gaussian process ( G P ) prior on the value function as a means to robustly identify optimal DTRs (Rodriguez Duque D, Stephens DA, Moodie EEM. Estimation of optimal dynamic treatment regimes using Gaussian processes; 2022. Available from: https://doi.org/10.48550/arXiv.2105.12259). We demonstrate how a G P approach may be implemented with the BayesDTR package and contrast it with other value-search approaches to identifying optimal DTRs. We use data from an HIV therapeutic trial in order to illustrate a standard analysis with these methods, using both the original observed trial data and an additional simulated component to showcase a longitudinal (two-stage DTR) analysis.
Assuntos
Modelos Estatísticos , Medicina de Precisão , Humanos , Teorema de Bayes , Medicina de Precisão/métodos , Bioestatística/métodosRESUMO
Non-parametric estimation of the survival function using observed failure time data depends on the underlying data generating mechanism, including the ways in which the data may be censored and/or truncated. For data arising from a single source or collected from a single cohort, a wide range of estimators have been proposed and compared in the literature. Often, however, it may be possible, and indeed advantageous, to combine and then analyze survival data that have been collected under different study designs. We review non-parametric survival analysis for data obtained by combining the most common types of cohort. We have two main goals: (i) To clarify the differences in the model assumptions, and (ii) to provide a single lens through which some of the proposed estimators may be viewed. Our discussion is relevant to the meta analysis of survival data obtained from different types of study, and to the modern era of electronic health records.
RESUMO
In the management of most chronic conditions characterized by the lack of universally effective treatments, adaptive treatment strategies (ATSs) have grown in popularity as they offer a more individualized approach. As a result, sequential multiple assignment randomized trials (SMARTs) have gained attention as the most suitable clinical trial design to formalize the study of these strategies. While the number of SMARTs has increased in recent years, sample size and design considerations have generally been carried out in frequentist settings. However, standard frequentist formulae require assumptions on interim response rates and variance components. Misspecifying these can lead to incorrect sample size calculations and correspondingly inadequate levels of power. The Bayesian framework offers a straightforward path to alleviate some of these concerns. In this paper, we provide calculations in a Bayesian setting to allow more realistic and robust estimates that account for uncertainty in inputs through the 'two priors' approach. Additionally, compared to the standard frequentist formulae, this methodology allows us to rely on fewer assumptions, integrate pre-trial knowledge, and switch the focus from the standardized effect size to the MDD. The proposed methodology is evaluated in a thorough simulation study and is implemented to estimate the sample size for a full-scale SMART of an internet-based adaptive stress management intervention on cardiovascular disease patients using data from its pilot study conducted in two Canadian provinces.
Assuntos
Projetos de Pesquisa , Humanos , Tamanho da Amostra , Teorema de Bayes , Projetos Piloto , Canadá , Simulação por ComputadorRESUMO
Considerable statistical work done on dynamic treatment regimes (DTRs) is in the frequentist paradigm, but Bayesian methods may have much to offer in this setting as they allow for the appropriate representation and propagation of uncertainty, including at the individual level. In this work, we extend the use of recently developed Bayesian methods for Marginal Structural Models to arrive at inference of DTRs. We do this (i) by linking the observational world with a world in which all patients are randomized to a DTR, thereby allowing for causal inference and then (ii) by maximizing a posterior predictive utility, where the posterior distribution has been obtained from nonparametric prior assumptions on the observational world data-generating process. Our approach relies on Bayesian semiparametric inference, where inference about a finite-dimensional parameter is made all while working within an infinite-dimensional space of distributions. We further study Bayesian inference of DTRs in the double robust setting by using posterior predictive inference and the nonparametric Bayesian bootstrap. The proposed methods allow for uncertainty quantification at the individual level, thereby enabling personalized decision-making. We examine the performance of these methods via simulation and demonstrate their utility by exploring whether to adapt HIV therapy to a measure of patients' liver health, in order to minimize liver scarring.
Assuntos
Modelos Estatísticos , Humanos , Teorema de Bayes , Incerteza , Simulação por ComputadorRESUMO
1. Parasites that infect multiple species cause major health burdens globally, but for many, the full suite of susceptible hosts is unknown. Predicting undocumented host-parasite associations will help expand knowledge of parasite host specificities, promote the development of theory in disease ecology and evolution, and support surveillance of multi-host infectious diseases. The analysis of global species interaction networks allows for leveraging of information across taxa, but link prediction at this scale is often limited by extreme network sparsity and lack of comparable trait data across species. 2. Here we use recently developed methods to predict missing links in global mammal-parasite networks using readily available data: network properties and evolutionary relationships among hosts. We demonstrate how these link predictions can efficiently guide the collection of species interaction data and increase the completeness of global species interaction networks. 3. We amalgamate a global mammal host-parasite interaction network (>29,000 interactions) and apply a hierarchical Bayesian approach for link prediction that leverages information on network structure and scaled phylogenetic distances among hosts. We use these predictions to guide targeted literature searches of the most likely yet undocumented interactions, and identify empirical evidence supporting many of the top 'missing' links. 4. We find that link prediction in global host-parasite networks can successfully predict parasites of humans, domesticated animals and endangered wildlife, representing a combination of published interactions missing from existing global databases, and potential but currently undocumented associations. 5. Our study provides further insight into the use of phylogenies for predicting host-parasite interactions, and highlights the utility of iterated prediction and targeted search to efficiently guide the collection of information on host-parasite interactions. These data are critical for understanding the evolution of host specificity, and may be used to support disease surveillance through a process of predicting missing links, and targeting research towards the most likely undocumented interactions.
Assuntos
Parasitos , Animais , Teorema de Bayes , Ecologia , Interações Hospedeiro-Parasita , Mamíferos , FilogeniaRESUMO
We consider the modeling of data generated by a latent continuous-time Markov jump process with a state space of finite but unknown dimensions. Typically in such models, the number of states has to be pre-specified, and Bayesian inference for a fixed number of states has not been studied until recently. In addition, although approaches to address the problem for discrete-time models have been developed, no method has been successfully implemented for the continuous-time case. We focus on reversible jump Markov chain Monte Carlo which allows the trans-dimensional move among different numbers of states in order to perform Bayesian inference for the unknown number of states. Specifically, we propose an efficient split-combine move which can facilitate the exploration of the parameter space, and demonstrate that it can be implemented effectively at scale. Subsequently, we extend this algorithm to the context of model-based clustering, allowing numbers of states and clusters both determined during the analysis. The model formulation, inference methodology, and associated algorithm are illustrated by simulation studies. Finally, we apply this method to real data from a Canadian healthcare system in Quebec. SUPPLEMENTARY INFORMATION: The online version supplementary material available at 10.1007/s11222-021-10032-8.
RESUMO
Large amounts of longitudinal health records are now available for dynamic monitoring of the underlying processes governing the observations. However, the health status progression across time is not typically observed directly: records are observed only when a subject interacts with the system, yielding irregular and often sparse observations. This suggests that the observed trajectories should be modeled via a latent continuous-time process potentially as a function of time-varying covariates. We develop a continuous-time hidden Markov model to analyze longitudinal data accounting for irregular visits and different types of observations. By employing a specific missing data likelihood formulation, we can construct an efficient computational algorithm. We focus on Bayesian inference for the model: this is facilitated by an expectation-maximization algorithm and Markov chain Monte Carlo methods. Simulation studies demonstrate that these approaches can be implemented efficiently for large data sets in a fully Bayesian setting. We apply this model to a real cohort where patients suffer from chronic obstructive pulmonary disease with the outcome being the number of drugs taken, using health care utilization indicators and patient characteristics as covariates.
Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Teorema de Bayes , Humanos , Cadeias de Markov , Método de Monte CarloRESUMO
Most estimation algorithms for adaptive treatment strategies assume that treatment rules at each decision point are independent from one another in the sense that they do not possess any common parameters. This is often unrealistic, as the same decisions may be made repeatedly over time. Sharing treatment-decision parameters across decision points offers several advantages, including estimation of fewer parameters and the clinical ease of a single, time-invariant decision to implement. We propose a new computational approach to estimation of shared-parameter G-estimation, which is efficient and shares the double robustness of the "unshared" sequential G-estimation. We use this approach to analyze data from the Scottish Early Rheumatoid Arthritis (SERA) Inception Cohort.
RESUMO
Dynamic treatment regimes (DTRs) aim to formalize personalized medicine by tailoring treatment decisions to individual patient characteristics. G-estimation for DTR identification targets the parameters of a structural nested mean model, known as the blip function, from which the optimal DTR is derived. Despite its potential, G-estimation has not seen widespread use in the literature, owing in part to its often complex presentation and implementation, but also due to the necessity for correct specification of the blip. Using a quadratic approximation approach inspired by iteratively reweighted least squares, we derive a quasi-likelihood function for G-estimation within the DTR framework, and show how it can be used to form an information criterion for blip model selection. We outline the theoretical properties of this model selection criterion and demonstrate its application in a variety of simulation studies as well as in data from the Sequenced Treatment Alternatives to Relieve Depression study.
Assuntos
Modelos Estatísticos , Medicina de Precisão/métodos , Simulação por Computador , Depressão/prevenção & controle , Humanos , Análise dos Mínimos Quadrados , Funções VerossimilhançaRESUMO
BACKGROUND: Phylogenetics has been used to investigate HIV transmission among men who have sex with men. This study compares several methodologies to elucidate the role of transmission chains in the dynamics of HIV spread in Quebec, Canada. METHODS: The Quebec Human Immunodeficiency Virus (HIV) genotyping program database now includes viral sequences from close to 4,000 HIV-positive individuals classified as Men who have Sex with Men (MSMs), collected between 1996 and early 2016. Assessment of chain expansion may depend on the partitioning scheme used, and so, we produce estimates from several methods: the conventional Bayesian and maximum likelihood-bootstrap methods, in combination with a variety of schemes for applying a maximum distance criterion, and two other algorithms, DM-PhyClus, a Bayesian algorithm that produces a measure of uncertainty for proposed partitions, and the Gap Procedure, a fast non-phylogenetic approach. Sequences obtained from individuals in the Primary HIV Infection (PHI) stage serve to identify incident cases. We focus on the period ranging from January 1st 2012 to February 1st 2016. RESULTS AND CONCLUSION: The analyses reveal considerable overlap between chain estimates obtained from conventional methods, thus leading to similar estimates of recent temporal expansion. The Gap Procedure and DM-PhyClus suggest however moderately different chains. Nevertheless, all estimates stress that longer older chains are responsible for a sizeable proportion of the sampled incident cases among MSMs. Curbing the HIV epidemic will require strategies aimed specifically at preventing such growth.
Assuntos
Infecções por HIV/transmissão , HIV-1 , Algoritmos , Teorema de Bayes , Epidemias , Genótipo , Infecções por HIV/epidemiologia , Infecções por HIV/virologia , HIV-1/classificação , HIV-1/genética , Homossexualidade Masculina , Humanos , Funções Verossimilhança , Masculino , Epidemiologia Molecular , Filogenia , Quebeque/epidemiologia , Produtos do Gene pol do Vírus da Imunodeficiência Humana/genéticaRESUMO
Cancers treated by transplantation are often curative, but immunosuppressive drugs are required to prevent and (if needed) to treat graft-versus-host disease. Estimation of an optimal adaptive treatment strategy when treatment at either one of two stages of treatment may lead to a cure has not yet been considered. Using a sample of 9563 patients treated for blood and bone cancers by allogeneic hematopoietic cell transplantation drawn from the Center for Blood and Marrow Transplant Research database, we provide a case study of a novel approach to Q-learning for survival data in the presence of a potentially curative treatment, and demonstrate the results differ substantially from an implementation of Q-learning that fails to account for the cure-rate.
Assuntos
Bioestatística/métodos , Transplante de Células-Tronco Hematopoéticas/efeitos adversos , Imunossupressores/farmacologia , Aprendizado de Máquina , Doença Enxerto-Hospedeiro/etiologia , Doença Enxerto-Hospedeiro/prevenção & controle , Humanos , Neoplasias/imunologia , Neoplasias/terapia , Resultado do TratamentoRESUMO
DNA methylation studies have enabled researchers to understand methylation patterns and their regulatory roles in biological processes and disease. However, only a limited number of statistical approaches have been developed to provide formal quantitative analysis. Specifically, a few available methods do identify differentially methylated CpG (DMC) sites or regions (DMR), but they suffer from limitations that arise mostly due to challenges inherent in bisulfite sequencing data. These challenges include: (1) that read-depths vary considerably among genomic positions and are often low; (2) both methylation and autocorrelation patterns change as regions change; and (3) CpG sites are distributed unevenly. Furthermore, there are several methodological limitations: almost none of these tools is capable of comparing multiple groups and/or working with missing values, and only a few allow continuous or multiple covariates. The last of these is of great interest among researchers, as the goal is often to find which regions of the genome are associated with several exposures and traits. To tackle these issues, we have developed an efficient DMC identification method based on Hidden Markov Models (HMMs) called "DMCHMM" which is a three-step approach (model selection, prediction, testing) aiming to address the aforementioned drawbacks. Our proposed method is different from other HMM methods since it profiles methylation of each sample separately, hence exploiting inter-CpG autocorrelation within samples, and it is more flexible than previous approaches by allowing multiple hidden states. Using simulations, we show that DMCHMM has the best performance among several competing methods. An analysis of cell-separated blood methylation profiles is also provided.
Assuntos
Ilhas de CpG/genética , Metilação de DNA , Cadeias de Markov , Sulfitos , Algoritmos , Animais , Sítios de Ligação , Células Sanguíneas/metabolismo , Simulação por Computador/economia , Simulação por Computador/estatística & dados numéricos , Humanos , Análise de Sequência de DNA/métodosRESUMO
In investigations of the effect of treatment on outcome, the propensity score is a tool to eliminate imbalance in the distribution of confounding variables between treatment groups. Recent work has suggested that Super Learner, an ensemble method, outperforms logistic regression in nonlinear settings; however, experience with real-data analyses tends to show overfitting of the propensity score model using this approach. We investigated a wide range of simulated settings of varying complexities including simulations based on real data to compare the performances of logistic regression, generalized boosted models, and Super Learner in providing balance and for estimating the average treatment effect via propensity score regression, propensity score matching, and inverse probability of treatment weighting. We found that Super Learner and logistic regression are comparable in terms of covariate balance, bias, and mean squared error (MSE); however, Super Learner is computationally very expensive thus leaving no clear advantage to the more complex approach. Propensity scores estimated by generalized boosted models were inferior to the other two estimation approaches. We also found that propensity score regression adjustment was superior to either matching or inverse weighting when the form of the dependence on the treatment on the outcome is correctly specified.