|

1.

Forecasting the onset of depression with limited baseline data only: A comparison of a person-specific and a multilevel modeling based exponentially weighted moving average approach.

Schat, Evelien; Tuerlinckx, Francis; Schreuder, Marieke J; De Ketelaere, Bart; Ceulemans, Eva.

Psychol Assess ; 36(6-7): 379-394, 2024.

Article En | MEDLINE | ID: mdl-38829348

The onset of depressive episodes is preceded by changes in mean levels of affective experiences, which can be detected using the exponentially weighted moving average procedure on experience sampling method (ESM) data. Applying the exponentially weighted moving average procedure requires sufficient baseline data from the person under study in healthy times, which is needed to calculate a control limit for monitoring incoming ESM data. It is, however, not trivial to obtain sufficient baseline data from a single person. We therefore investigate whether historical ESM data from healthy individuals can help establish an adequate control limit for the person under study via multilevel modeling. Specifically, we focus on the case in which there is very little baseline data available of the person under study (i.e., up to 7 days). This multilevel approach is compared with the traditional, person-specific approach, where estimates are obtained using the person's available baseline data. Predictive performance in terms of Matthews correlation coefficient did not differ much between the approaches; however, the multilevel approach was more sensitive at detecting mean changes. This implies that for low-cost and nonharmful interventions, the multilevel approach may prove particularly beneficial. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

Ecological Momentary Assessment , Multilevel Analysis , Humans , Adult , Female , Male , Depression/psychology , Depression/diagnosis , Models, Statistical , Young Adult , Middle Aged

2.

Estimating classification consistency of machine learning models for screening measures.

Gonzalez, Oscar; Georgeson, A R; Pelham, William E.

Psychol Assess ; 36(6-7): 395-406, 2024.

Article En | MEDLINE | ID: mdl-38829349

This article illustrates novel quantitative methods to estimate classification consistency in machine learning models used for screening measures. Screening measures are used in psychology and medicine to classify individuals into diagnostic classifications. In addition to achieving high accuracy, it is ideal for the screening process to have high classification consistency, which means that respondents would be classified into the same group every time if the assessment was repeated. Although machine learning models are increasingly being used to predict a screening classification based on individual item responses, methods to describe the classification consistency of machine learning models have not yet been developed. This article addresses this gap by describing methods to estimate classification inconsistency in machine learning models arising from two different sources: sampling error during model fitting and measurement error in the item responses. These methods use data resampling techniques such as the bootstrap and Monte Carlo sampling. These methods are illustrated using three empirical examples predicting a health condition/diagnosis from item responses. R code is provided to facilitate the implementation of the methods. This article highlights the importance of considering classification consistency alongside accuracy when studying screening measures and provides the tools and guidance necessary for applied researchers to obtain classification consistency indices in their machine learning research on diagnostic assessments. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

Machine Learning , Humans , Models, Statistical , Mass Screening

3.

Generalized Confidence Intervals for Ratios of Standard Deviations Based on Log-Normal Distribution when Times Follow Weibull Distributions.

Chen, Pei-Fu; Dexter, Franklin.

J Med Syst ; 48(1): 58, 2024 Jun 01.

Article En | MEDLINE | ID: mdl-38822876

Modern anesthetic drugs ensure the efficacy of general anesthesia. Goals include reducing variability in surgical, tracheal extubation, post-anesthesia care unit, or intraoperative response recovery times. Generalized confidence intervals based on the log-normal distribution compare variability between groups, specifically ratios of standard deviations. The alternative statistical approaches, performing robust variance comparison tests, give P-values, not point estimates nor confidence intervals for the ratios of the standard deviations. We performed Monte-Carlo simulations to learn what happens to confidence intervals for ratios of standard deviations of anesthesia-associated times when analyses are based on the log-normal, but the true distributions are Weibull. We used simulation conditions comparable to meta-analyses of most randomized trials in anesthesia, n ≈ 25 and coefficients of variation ≈ 0.30 . The estimates of the ratios of standard deviations were positively biased, but slightly, the ratios being 0.11% to 0.33% greater than nominal. In contrast, the 95% confidence intervals were very wide (i.e., > 95% of P ≥ 0.05). Although substantive inferentially, the differences in the confidence limits were small from a clinical or managerial perspective, with a maximum absolute difference in ratios of 0.016. Thus, P < 0.05 is reliable, but investigators should plan for Type II errors at greater than nominal rates.

Monte Carlo Method , Humans , Confidence Intervals , Anesthesia, General , Time Factors , Models, Statistical

4.

On exact randomization-based covariate-adjusted confidence intervals.

Fiksel, Jacob.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38837900

Randomization-based inference using the Fisher randomization test allows for the computation of Fisher-exact P-values, making it an attractive option for the analysis of small, randomized experiments with non-normal outcomes. Two common test statistics used to perform Fisher randomization tests are the difference-in-means between the treatment and control groups and the covariate-adjusted version of the difference-in-means using analysis of covariance. Modern computing allows for fast computation of the Fisher-exact P-value, but confidence intervals have typically been obtained by inverting the Fisher randomization test over a range of possible effect sizes. The test inversion procedure is computationally expensive, limiting the usage of randomization-based inference in applied work. A recent paper by Zhu and Liu developed a closed form expression for the randomization-based confidence interval using the difference-in-means statistic. We develop an important extension of Zhu and Liu to obtain a closed form expression for the randomization-based covariate-adjusted confidence interval and give practitioners a sufficiency condition that can be checked using observed data and that guarantees that these confidence intervals have correct coverage. Simulations show that our procedure generates randomization-based covariate-adjusted confidence intervals that are robust to non-normality and that can be calculated in nearly the same time as it takes to calculate the Fisher-exact P-value, thus removing the computational barrier to performing randomization-based inference when adjusting for covariates. We also demonstrate our method on a re-analysis of phase I clinical trial data.

Computer Simulation , Confidence Intervals , Humans , Biometry/methods , Models, Statistical , Data Interpretation, Statistical , Random Allocation , Randomized Controlled Trials as Topic/statistics & numerical data , Randomized Controlled Trials as Topic/methods

5.

Rejoinder to "On exact randomization-based covariate-adjusted confidence intervals" by Jacob Fiksel.

Zhu, Ke; Liu, Hanzhong.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38837901

Random Allocation , Confidence Intervals , Humans , Biometry/methods , Data Interpretation, Statistical , Models, Statistical

6.

Incorporating nonparametric methods for estimating causal excursion effects in mobile health with zero-inflated count outcomes.

Liu, Xueqing; Qian, Tianchen; Bell, Lauren; Chakraborty, Bibhas.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38837902

In mobile health, tailoring interventions for real-time delivery is of paramount importance. Micro-randomized trials have emerged as the "gold-standard" methodology for developing such interventions. Analyzing data from these trials provides insights into the efficacy of interventions and the potential moderation by specific covariates. The "causal excursion effect," a novel class of causal estimand, addresses these inquiries. Yet, existing research mainly focuses on continuous or binary data, leaving count data largely unexplored. The current work is motivated by the Drink Less micro-randomized trial from the UK, which focuses on a zero-inflated proximal outcome, i.e., the number of screen views in the subsequent hour following the intervention decision point. To be specific, we revisit the concept of causal excursion effect, specifically for zero-inflated count outcomes, and introduce novel estimation approaches that incorporate nonparametric techniques. Bidirectional asymptotics are established for the proposed estimators. Simulation studies are conducted to evaluate the performance of the proposed methods. As an illustration, we also implement these methods to the Drink Less trial data.

Computer Simulation , Telemedicine , Humans , Telemedicine/statistics & numerical data , Statistics, Nonparametric , Causality , Randomized Controlled Trials as Topic , Models, Statistical , Biometry/methods , Data Interpretation, Statistical

7.

Two-part predictive modeling for COVID-19 cases and deaths in the U.S.

Le, Teresa-Thuong; Liao, Xiyue.

PLoS One ; 19(6): e0302324, 2024.

Article En | MEDLINE | ID: mdl-38843223

COVID-19 prediction has been essential in the aid of prevention and control of the disease. The motivation of this case study is to develop predictive models for COVID-19 cases and deaths based on a cross-sectional data set with a total of 28,955 observations and 18 variables, which is compiled from 5 data sources from Kaggle. A two-part modeling framework, in which the first part is a logistic classifier and the second part includes machine learning or statistical smoothing methods, is introduced to model the highly skewed distribution of COVID-19 cases and deaths. We also aim to understand what factors are most relevant to COVID-19's occurrence and fatality. Evaluation criteria such as root mean squared error (RMSE) and mean absolute error (MAE) are used. We find that the two-part XGBoost model perform best with predicting the entire distribution of COVID-19 cases and deaths. The most important factors relevant to either COVID-19 cases or deaths include population and the rate of primary care physicians.

COVID-19 , Machine Learning , COVID-19/mortality , COVID-19/epidemiology , Humans , United States/epidemiology , SARS-CoV-2/isolation & purification , Models, Statistical , Cross-Sectional Studies

8.

Association of longitudinal pet ownership with wheezing in 3-year-old children using the distributed lag model: the Japan Environment and Children's Study.

Shirato, Kota; Oba, Koji; Matsuyama, Yutaka; Hagiwara, Yasuhiro.

Environ Health ; 23(1): 53, 2024 Jun 06.

Article En | MEDLINE | ID: mdl-38844911

BACKGROUND: Time-varying exposures like pet ownership pose challenges for identifying critical windows due to multicollinearity when modeled simultaneously. The Distributed Lag Model (DLM) estimates critical windows for time-varying exposures, which are mainly continuous variables. However, applying complex functions such as high-order splines and nonlinear functions within DLMs may not be suitable for situations with limited time points or binary exposure, such as in questionnaire surveys. OBJECTIVES: (1) We examined the estimation performance of a simple DLM with fractional polynomial function for time-varying binary exposures through simulation experiments. (2) We evaluated the impact of pet ownership on childhood wheezing onset and estimate critical windows. METHODS: (1) We compared logistic regression including time-varying exposure in separate models, in one model simultaneously, and using DLM. For evaluation, we employed bias, empirical standard error (EmpSE), and mean squared error (MSE). (2) The Japan Environment and Children's Study (JECS) is a prospective birth cohort study of approximately 100,000 parent-child pairs, registered across Japan from 2011 to 2014. We applied DLM to the JECS data up to age 3. The estimated odds ratios (OR) were considered to be within critical windows when they were significant at the 5% level. RESULTS: (1) DLM and the separate model exhibited lower bias compared to the simultaneously model. Additionally, both DLM and the simultaneously model demonstrated lower EmpSEs than the separate model. In all scenarios, DLM had lower MSEs than the other methods. Specifically, where critical windows is clearly present and exposure correlation is high, DLM showed MSEs about 1/2 to 1/200 of those of other models. (2) Application of DLM to the JECS data showed that, unlike other models, a significant exposure effect was observed only between the ages of 0 and 6 months. During that periods, the highest ORs were 1.07 (95% confidence interval, 1.01 to 1.14) , observed between the ages of 2 and 5 months. CONCLUSIONS: (1) A simple DLM improves the accuracy of exposure effect and critical windows estimation. (2) 0-6 months may be the critical windows for the effect of pet ownership on the wheezing onset at 3 years.

Ownership , Pets , Respiratory Sounds , Humans , Japan/epidemiology , Child, Preschool , Female , Male , Ownership/statistics & numerical data , Animals , Environmental Exposure/adverse effects , Prospective Studies , Infant , Models, Statistical , Longitudinal Studies , Logistic Models

9.

Updated fiducial distribution of parameters in the associated delta-lognormal population.

Wang, Yufan; Xu, Xingzhong.

PLoS One ; 19(6): e0298307, 2024.

Article En | MEDLINE | ID: mdl-38838002

In this paper we consider a special kind of semicontinous distribution. We try to concern with the situation where the probability of zero observation is associated with the location and scale parameters in lognormal distribution. We first propose a goodness-of-fit test to ensure that the data can be fit by the associated delta-lognormal distribution. Then we define the updated fiducial distributions of the parameters and establish the results that the confidence interval has asymtotically correct level while the significance level of the hypothesis testing is also asymtotically correct. We propose an exact sampling method to sample from the updated fiducial distribution. It can be seen in our simulation study that the inference on the parameters is largely improved. A real data example is also used to illustrate our method.

Computer Simulation , Models, Statistical , Humans , Algorithms

10.

Using a Leroux-prior-based conditional autoregression-based strategy to map the short-term association between temperature and bacillary dysentery and its attributable burden in China.

Wang, Jianping; Lu, Kai; Wei, Yuxin; Wang, Wei; Zhou, Yongming; Zeng, Jing; Deng, Ying; Zhang, Tao; Yin, Fei; Ma, Yue; Shui, Tiejun.

Front Public Health ; 12: 1297635, 2024.

Article En | MEDLINE | ID: mdl-38827625

Background: In China, bacillary dysentery (BD) is the third most frequently reported infectious disease, with the greatest annual incidence rate of 38.03 cases per 10,000 person-years. It is well acknowledged that temperature is associated with BD and the previous studies of temperature-BD association in different provinces of China present a considerable heterogeneity, which may lead to an inaccurate estimation for a region-specific association and incorrect attributable burdens. Meanwhile, the common methods for multi-city studies, such as stratified strategy and meta-analysis, have their own limitations in handling the heterogeneity. Therefore, it is necessary to adopt an appropriate method considering the spatial autocorrelation to accurately characterize the spatial distribution of temperature-BD association and obtain its attributable burden in 31 provinces of China. Methods: A novel three-stage strategy was adopted. In the first stage, we used the generalized additive model (GAM) model to independently estimate the province-specific association between monthly average temperature (MAT) and BD. In the second stage, the Leroux-prior-based conditional autoregression (LCAR) was used to spatially smooth the association and characterize its spatial distribution. In the third stage, we calculate the attribute BD cases based on a more accurate estimation of association. Results: The smoothed association curves generally show a higher relative risk with a higher MAT, but some of them have an inverted "V" shape. Meanwhile, the spatial distribution of association indicates that western provinces have a higher relative risk of MAT than eastern provinces with 0.695 and 0.645 on average, respectively. The maximum and minimum total attributable number of cases are 224,257 in Beijing and 88,906 in Hainan, respectively. The average values of each province in the eastern, western, and central areas are approximately 40,991, 42,025, and 26,947, respectively. Conclusion: Based on the LCAR-based three-stage strategy, we can obtain a more accurate spatial distribution of temperature-BD association and attributable BD cases. Furthermore, the results can help relevant institutions to prevent and control the epidemic of BD efficiently.

Dysentery, Bacillary , Temperature , China/epidemiology , Humans , Dysentery, Bacillary/epidemiology , Incidence , Spatial Analysis , Models, Statistical

11.

On the sensitivity of centrality metrics.

Cavallaro, Lucia; De Meo, Pasquale; Fiumara, Giacomo; Liotta, Antonio.

PLoS One ; 19(5): e0299255, 2024.

Article En | MEDLINE | ID: mdl-38722923

Despite the huge importance that the centrality metrics have in understanding the topology of a network, too little is known about the effects that small alterations in the topology of the input graph induce in the norm of the vector that stores the node centralities. If so, then it could be possible to avoid re-calculating the vector of centrality metrics if some minimal changes occur in the network topology, which would allow for significant computational savings. Hence, after formalising the notion of centrality, three of the most basic metrics were herein considered (i.e., Degree, Eigenvector, and Katz centrality). To perform the simulations, two probabilistic failure models were used to describe alterations in network topology: Uniform (i.e., all nodes can be independently deleted from the network with a fixed probability) and Best Connected (i.e., the probability a node is removed depends on its degree). Our analysis suggests that, in the case of degree, small variations in the topology of the input graph determine small variations in Degree centrality, independently of the topological features of the input graph; conversely, both Eigenvector and Katz centralities can be extremely sensitive to changes in the topology of the input graph. In other words, if the input graph has some specific features, even small changes in the topology of the input graph can have catastrophic effects on the Eigenvector or Katz centrality.

Algorithms , Computer Simulation , Models, Theoretical , Models, Statistical , Probability

12.

Weibull parametric model for survival analysis in women with endometrial cancer using clinical and T2-weighted MRI radiomic features.

Li, Xingfeng; Marcus, Diana; Russell, James; Aboagye, Eric O; Ellis, Laura Burney; Sheeka, Alexander; Park, Won-Ho Edward; Bharwani, Nishat; Ghaem-Maghami, Sadaf; Rockall, Andrea G.

BMC Med Res Methodol ; 24(1): 107, 2024 May 09.

Article En | MEDLINE | ID: mdl-38724889

BACKGROUND: Semiparametric survival analysis such as the Cox proportional hazards (CPH) regression model is commonly employed in endometrial cancer (EC) study. Although this method does not need to know the baseline hazard function, it cannot estimate event time ratio (ETR) which measures relative increase or decrease in survival time. To estimate ETR, the Weibull parametric model needs to be applied. The objective of this study is to develop and evaluate the Weibull parametric model for EC patients' survival analysis. METHODS: Training (n = 411) and testing (n = 80) datasets from EC patients were retrospectively collected to investigate this problem. To determine the optimal CPH model from the training dataset, a bi-level model selection with minimax concave penalty was applied to select clinical and radiomic features which were obtained from T2-weighted MRI images. After the CPH model was built, model diagnostic was carried out to evaluate the proportional hazard assumption with Schoenfeld test. Survival data were fitted into a Weibull model and hazard ratio (HR) and ETR were calculated from the model. Brier score and time-dependent area under the receiver operating characteristic curve (AUC) were compared between CPH and Weibull models. Goodness of the fit was measured with Kolmogorov-Smirnov (KS) statistic. RESULTS: Although the proportional hazard assumption holds for fitting EC survival data, the linearity of the model assumption is suspicious as there are trends in the age and cancer grade predictors. The result also showed that there was a significant relation between the EC survival data and the Weibull distribution. Finally, it showed that Weibull model has a larger AUC value than CPH model in general, and it also has smaller Brier score value for EC survival prediction using both training and testing datasets, suggesting that it is more accurate to use the Weibull model for EC survival analysis. CONCLUSIONS: The Weibull parametric model for EC survival analysis allows simultaneous characterization of the treatment effect in terms of the hazard ratio and the event time ratio (ETR), which is likely to be better understood. This method can be extended to study progression free survival and disease specific survival. TRIAL REGISTRATION: ClinicalTrials.gov NCT03543215, https://clinicaltrials.gov/ , date of registration: 30th June 2017.

Endometrial Neoplasms , Magnetic Resonance Imaging , Proportional Hazards Models , Humans , Female , Endometrial Neoplasms/mortality , Endometrial Neoplasms/diagnostic imaging , Middle Aged , Magnetic Resonance Imaging/methods , Retrospective Studies , Survival Analysis , Aged , ROC Curve , Adult , Models, Statistical , Radiomics

13.

Upstrapping to determine futility: predicting future outcomes nonparametrically from past data.

Wild, Jessica L; Ginde, Adit A; Lindsell, Christopher J; Kaizer, Alexander M.

Trials ; 25(1): 312, 2024 May 09.

Article En | MEDLINE | ID: mdl-38725072

BACKGROUND: Clinical trials often involve some form of interim monitoring to determine futility before planned trial completion. While many options for interim monitoring exist (e.g., alpha-spending, conditional power), nonparametric based interim monitoring methods are also needed to account for more complex trial designs and analyses. The upstrap is one recently proposed nonparametric method that may be applied for interim monitoring. METHODS: Upstrapping is motivated by the case resampling bootstrap and involves repeatedly sampling with replacement from the interim data to simulate thousands of fully enrolled trials. The p-value is calculated for each upstrapped trial and the proportion of upstrapped trials for which the p-value criteria are met is compared with a pre-specified decision threshold. To evaluate the potential utility for upstrapping as a form of interim futility monitoring, we conducted a simulation study considering different sample sizes with several different proposed calibration strategies for the upstrap. We first compared trial rejection rates across a selection of threshold combinations to validate the upstrapping method. Then, we applied upstrapping methods to simulated clinical trial data, directly comparing their performance with more traditional alpha-spending and conditional power interim monitoring methods for futility. RESULTS: The method validation demonstrated that upstrapping is much more likely to find evidence of futility in the null scenario than the alternative across a variety of simulations settings. Our three proposed approaches for calibration of the upstrap had different strengths depending on the stopping rules used. Compared to O'Brien-Fleming group sequential methods, upstrapped approaches had type I error rates that differed by at most 1.7% and expected sample size was 2-22% lower in the null scenario, while in the alternative scenario power fluctuated between 15.7% lower and 0.2% higher and expected sample size was 0-15% lower. CONCLUSIONS: In this proof-of-concept simulation study, we evaluated the potential for upstrapping as a resampling-based method for futility monitoring in clinical trials. The trade-offs in expected sample size, power, and type I error rate control indicate that the upstrap can be calibrated to implement futility monitoring with varying degrees of aggressiveness and that performance similarities can be identified relative to considered alpha-spending and conditional power futility monitoring methods.

Clinical Trials as Topic , Computer Simulation , Medical Futility , Research Design , Humans , Clinical Trials as Topic/methods , Sample Size , Data Interpretation, Statistical , Models, Statistical , Treatment Outcome

14.

Network meta-analysis for an ordinal outcome when outcome categorization varies across trials.

Morris, Paul; Wang, Chong; O'Connor, Annette.

Syst Rev ; 13(1): 128, 2024 May 09.

Article En | MEDLINE | ID: mdl-38725074

BACKGROUND: Binary outcomes are likely the most common in randomized controlled trials, but ordinal outcomes can also be of interest. For example, rather than simply collecting data on diseased versus healthy study subjects, investigators may collect information on the severity of disease, with no disease, mild, moderate, and severe disease as possible levels of the outcome. While some investigators may be interested in all levels of the ordinal variable, others may combine levels that are not of particular interest. Therefore, when research synthesizers subsequently conduct a network meta-analysis on a network of trials for which an ordinal outcome was measured, they may encounter a network in which outcome categorization varies across trials. METHODS: The standard method for network meta-analysis for an ordinal outcome based on a multinomial generalized linear model is not designed to accommodate the multiple outcome categorizations that might occur across trials. In this paper, we propose a network meta-analysis model for an ordinal outcome that allows for multiple categorizations. The proposed model incorporates the partial information provided by trials that combine levels through modification of the multinomial likelihoods of the affected arms, allowing for all available data to be considered in estimation of the comparative effect parameters. A Bayesian fixed effect model is used throughout, where the ordinality of the outcome is accounted for through the use of the adjacent-categories logit link. RESULTS: We illustrate the method by analyzing a real network of trials on the use of antibiotics aimed at preventing liver abscesses in beef cattle and explore properties of the estimates of the comparative effect parameters through simulation. We find that even with the categorization of the levels varying across trials, the magnitudes of the biases are relatively small and that under a large sample size, the root mean square errors become small as well. CONCLUSIONS: Our proposed method to conduct a network meta-analysis for an ordinal outcome when the categorization of the outcome varies across trials, which utilizes the adjacent-categories logit link, performs well in estimation. Because the method considers all available data in a single estimation, it will be particularly useful to research synthesizers when the network of interest has only a limited number of trials for each categorization of the outcome.

Network Meta-Analysis , Humans , Randomized Controlled Trials as Topic , Outcome Assessment, Health Care , Models, Statistical

15.

Using Bayesian statistics in confirmatory clinical trials in the regulatory setting: a tutorial review.

Lee, Se Yoon.

BMC Med Res Methodol ; 24(1): 110, 2024 May 07.

Article En | MEDLINE | ID: mdl-38714936

Bayesian statistics plays a pivotal role in advancing medical science by enabling healthcare companies, regulators, and stakeholders to assess the safety and efficacy of new treatments, interventions, and medical procedures. The Bayesian framework offers a unique advantage over the classical framework, especially when incorporating prior information into a new trial with quality external data, such as historical data or another source of co-data. In recent years, there has been a significant increase in regulatory submissions using Bayesian statistics due to its flexibility and ability to provide valuable insights for decision-making, addressing the modern complexity of clinical trials where frequentist trials are inadequate. For regulatory submissions, companies often need to consider the frequentist operating characteristics of the Bayesian analysis strategy, regardless of the design complexity. In particular, the focus is on the frequentist type I error rate and power for all realistic alternatives. This tutorial review aims to provide a comprehensive overview of the use of Bayesian statistics in sample size determination, control of type I error rate, multiplicity adjustments, external data borrowing, etc., in the regulatory environment of clinical trials. Fundamental concepts of Bayesian sample size determination and illustrative examples are provided to serve as a valuable resource for researchers, clinicians, and statisticians seeking to develop more complex and innovative designs.

Bayes Theorem , Clinical Trials as Topic , Humans , Clinical Trials as Topic/methods , Clinical Trials as Topic/statistics & numerical data , Research Design/standards , Sample Size , Data Interpretation, Statistical , Models, Statistical

16.

Forecasting the spread of COVID-19 based on policy, vaccination, and Omicron data.

Han, Kyulhee; Lee, Bogyeom; Lee, Doeun; Heo, Gyujin; Oh, Jooha; Lee, Seoyoung; Apio, Catherine; Park, Taesung.

Sci Rep ; 14(1): 9962, 2024 04 30.

Article En | MEDLINE | ID: mdl-38693172

The COVID-19 pandemic caused by the novel SARS-COV-2 virus poses a great risk to the world. During the COVID-19 pandemic, observing and forecasting several important indicators of the epidemic (like new confirmed cases, new cases in intensive care unit, and new deaths for each day) helped prepare the appropriate response (e.g., creating additional intensive care unit beds, and implementing strict interventions). Various predictive models and predictor variables have been used to forecast these indicators. However, the impact of prediction models and predictor variables on forecasting performance has not been systematically well analyzed. Here, we compared the forecasting performance using a linear mixed model in terms of prediction models (mathematical, statistical, and AI/machine learning models) and predictor variables (vaccination rate, stringency index, and Omicron variant rate) for seven selected countries with the highest vaccination rates. We decided on our best models based on the Bayesian Information Criterion (BIC) and analyzed the significance of each predictor. Simple models were preferred. The selection of the best prediction models and the use of Omicron variant rate were considered essential in improving prediction accuracies. For the test data period before Omicron variant emergence, the selection of the best models was the most significant factor in improving prediction accuracy. For the test period after Omicron emergence, Omicron variant rate use was considered essential in deciding forecasting accuracy. For prediction models, ARIMA, lightGBM, and TSGLM generally performed well in both test periods. Linear mixed models with country as a random effect has proven that the choice of prediction models and the use of Omicron data was significant in determining forecasting accuracies for the highly vaccinated countries. Relatively simple models, fit with either prediction model or Omicron data, produced best results in enhancing forecasting accuracies with test data.

COVID-19 Vaccines , COVID-19 , Forecasting , SARS-CoV-2 , Humans , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Forecasting/methods , SARS-CoV-2/immunology , Vaccination , Machine Learning , Pandemics/prevention & control , Health Policy , Bayes Theorem , Models, Statistical

17.

Statistical design and analysis of controlled human malaria infection trials.

Tian, Xiaowen; Janes, Holly E; Kublin, James G.

Malar J ; 23(1): 133, 2024 May 03.

Article En | MEDLINE | ID: mdl-38702775

BACKGROUND: Malaria is a potentially life-threatening disease caused by Plasmodium protozoa transmitted by infected Anopheles mosquitoes. Controlled human malaria infection (CHMI) trials are used to assess the efficacy of interventions for malaria elimination. The operating characteristics of statistical methods for assessing the ability of interventions to protect individuals from malaria is uncertain in small CHMI studies. This paper presents simulation studies comparing the performance of a variety of statistical methods for assessing efficacy of intervention in CHMI trials. METHODS: Two types of CHMI designs were investigated: the commonly used single high-dose design (SHD) and the repeated low-dose design (RLD), motivated by simian immunodeficiency virus (SIV) challenge studies. In the context of SHD, the primary efficacy endpoint is typically time to infection. Using a continuous time survival model, five statistical tests for assessing the extent to which an intervention confers partial or full protection under single dose CHMI designs were evaluated. For RLD, the primary efficacy endpoint is typically the binary infection status after a specific number of challenges. A discrete time survival model was used to study the characteristics of RLD versus SHD challenge studies. RESULTS: In a SHD study with the continuous time survival model, log-rank test and t-test are the most powerful and provide more interpretable results than Wilcoxon rank-sum tests and Lachenbruch tests, while the likelihood ratio test is uniformly most powerful but requires knowledge of the underlying probability model. In the discrete time survival model setting, SHDs are more powerful for assessing the efficacy of an intervention to prevent infection than RLDs. However, additional information can be inferred from RLD challenge designs, particularly using a likelihood ratio test. CONCLUSIONS: Different statistical methods can be used to analyze controlled human malaria infection (CHMI) experiments, and the choice of method depends on the specific characteristics of the experiment, such as the sample size allocation between the control and intervention groups, and the nature of the intervention. The simulation results provide guidance for the trade off in statistical power when choosing between different statistical methods and study designs.

Malaria , Humans , Malaria/prevention & control , Animals , Research Design , Controlled Clinical Trials as Topic , Models, Statistical , Anopheles/parasitology

18.

Developing Bayesian EWMA chart for change detection in the shape parameter of Inverse Gaussian process.

Javed, Amara; Abbas, Tahir; Abbas, Nasir.

PLoS One ; 19(5): e0301259, 2024.

Article En | MEDLINE | ID: mdl-38709733

Bayesian Control charts are emerging as the most efficient statistical tools for monitoring manufacturing processes and providing effective control over process variability. The Bayesian approach is particularly suitable for addressing parametric uncertainty in the manufacturing industry. In this study, we determine the monitoring threshold for the shape parameter of the Inverse Gaussian distribution (IGD) and design different exponentially-weighted-moving-average (EWMA) control charts based on different loss functions (LFs). The impact of hyperparameters is investigated on Bayes estimates (BEs) and posterior risks (PRs). The performance measures such as average run length (ARL), standard deviation of run length (SDRL), and median of run length (MRL) are employed to evaluate the suggested approach. The designed Bayesian charts are evaluated for different settings of smoothing constant of the EWMA chart, different sample sizes, and pre-specified false alarm rates. The simulative study demonstrates the effectiveness of the suggested Bayesian method-based EWMA charts as compared to the conventional classical setup-based EWMA charts. The proposed techniques of EWMA charts are highly efficient in detecting shifts in the shape parameter and outperform their classical counterpart in detecting faults quickly. The proposed technique is also applied to real-data case studies from the aerospace manufacturing industry. The quality characteristic of interest was selected as the monthly industrial production index of aircraft from January 1980 to December 2022. The real-data-based findings also validate the conclusions based on the simulative results.

Bayes Theorem , Normal Distribution , Algorithms , Humans , Models, Statistical

19.

A review of common statistical methods for dealing with multiple pollutant mixtures and multiple exposures.

Zhu, Guiming; Wen, Yanchao; Cao, Kexin; He, Simin; Wang, Tong.

Front Public Health ; 12: 1377685, 2024.

Article En | MEDLINE | ID: mdl-38784575

Traditional environmental epidemiology has consistently focused on studying the impact of single exposures on specific health outcomes, considering concurrent exposures as variables to be controlled. However, with the continuous changes in environment, humans are increasingly facing more complex exposures to multi-pollutant mixtures. In this context, accurately assessing the impact of multi-pollutant mixtures on health has become a central concern in current environmental research. Simultaneously, the continuous development and optimization of statistical methods offer robust support for handling large datasets, strengthening the capability to conduct in-depth research on the effects of multiple exposures on health. In order to examine complicated exposure mixtures, we introduce commonly used statistical methods and their developments, such as weighted quantile sum, bayesian kernel machine regression, toxic equivalency analysis, and others. Delineating their applications, advantages, weaknesses, and interpretability of results. It also provides guidance for researchers involved in studying multi-pollutant mixtures, aiding them in selecting appropriate statistical methods and utilizing R software for more accurate and comprehensive assessments of the impact of multi-pollutant mixtures on human health.

Environmental Exposure , Environmental Pollutants , Humans , Bayes Theorem , Models, Statistical

20.

Sequential covariate-adjusted randomization via hierarchically minimizing Mahalanobis distance and marginal imbalance.

Yang, Haoyu; Qin, Yichen; Li, Yang; Hu, Feifang.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38801258

In comparative studies, covariate balance and sequential allocation schemes have attracted growing academic interest. Although many theoretically justified adaptive randomization methods achieve the covariate balance, they often allocate patients in pairs or groups. To better meet the practical requirements where the clinicians cannot wait for other participants to assign the current patient for some economic or ethical reasons, we propose a method that randomizes patients individually and sequentially. The proposed method conceptually separates the covariate imbalance, measured by the newly proposed modified Mahalanobis distance, and the marginal imbalance, that is the sample size difference between the 2 groups, and it minimizes them with an explicit priority order. Compared with the existing sequential randomization methods, the proposed method achieves the best possible covariate balance while maintaining the marginal balance directly, offering us more control of the randomization process. We demonstrate the superior performance of the proposed method through a wide range of simulation studies and real data analysis, and also establish theoretical guarantees for the proposed method in terms of both the convergence of the imbalance measure and the subsequent treatment effect estimation.

Computer Simulation , Randomized Controlled Trials as Topic , Humans , Randomized Controlled Trials as Topic/statistics & numerical data , Randomized Controlled Trials as Topic/methods , Biometry/methods , Models, Statistical , Data Interpretation, Statistical , Random Allocation , Sample Size , Algorithms