Search | VHL Search Portal

1.

A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants.

Wang, Chonghao; Zhang, Jing; Veldsman, Werner Pieter; Zhou, Xin; Zhang, Lu.

Brief Bioinform ; 24(1)2023 01 19.

Article in English | MEDLINE | ID: mdl-36585786

ABSTRACT

Quantifying an individual's risk for common diseases is an important goal of precision health. The polygenic risk score (PRS), which aggregates multiple risk alleles of candidate diseases, has emerged as a standard approach for identifying high-risk individuals. Although several studies have been performed to benchmark the PRS calculation tools and assess their potential to guide future clinical applications, some issues remain to be further investigated, such as lacking (i) various simulated data with different genetic effects; (ii) evaluation of machine learning models and (iii) evaluation on multiple ancestries studies. In this study, we systematically validated and compared 13 statistical methods, 5 machine learning models and 2 ensemble models using simulated data with additive and genetic interaction models, 22 common diseases with internal training sets, 4 common diseases with external summary statistics and 3 common diseases for trans-ancestry studies in UK Biobank. The statistical methods were better in simulated data from additive models and machine learning models have edges for data that include genetic interactions. Ensemble models are generally the best choice by integrating various statistical methods. LDpred2 outperformed the other standalone tools, whereas PRS-CS, lassosum and DBSLMM showed comparable performance. We also identified that disease heritability strongly affected the predictive performance of all methods. Both the number and effect sizes of risk SNPs are important; and sample size strongly influences the performance of all methods. For the trans-ancestry studies, we found that the performance of most methods became worse when training and testing sets were from different populations.

Subject(s)

Machine Learning , Multifactorial Inheritance , Humans , Risk Factors , Genomics , Genetic Predisposition to Disease , Genome-Wide Association Study/methods

2.

Subarachnoid Hemorrhage Trials: Cutting, Sliding, or Keeping mRS Scores and WFNS Grades.

Mistry, Akshitkumar M; Saver, Jeffrey; Mack, William; Kamel, Hooman; Elm, Jordan; Beall, Jonathan.

Stroke ; 55(3): 779-784, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38235584

ABSTRACT

Rigorous evidence generation with randomized controlled trials has lagged for aneurysmal subarachnoid hemorrhage (SAH) compared with other forms of acute stroke. Besides its lower incidence compared with other stroke subtypes, the presentation and outcome of patients with SAH also differ. This must be considered and adjusted for in designing pivotal randomized controlled trials of patients with SAH. Here, we show the effect of the unique expected distribution of the SAH severity at presentation (World Federation of Neurological Surgeons grade) on the outcome most used in pivotal stroke randomized controlled trials (modified Rankin Scale) and, consequently, on the sample size. Furthermore, we discuss the advantages and disadvantages of different options to analyze the outcome and control the expected distribution of the World Federation of Neurological Surgeons grades in addition to showing their effects on the sample size. Finally, we offer methods that investigators can adapt to more precisely understand the effect of common modified Rankin Scale analysis methods and trial eligibility pertaining to the World Federation of Neurological Surgeons grade in designing their large-scale SAH randomized controlled trials.

Subject(s)

Stroke , Subarachnoid Hemorrhage , Humans , Subarachnoid Hemorrhage/therapy , Subarachnoid Hemorrhage/surgery , Treatment Outcome , Neurosurgical Procedures , Neurosurgeons , Stroke/surgery

3.

Adaptive Randomization Method to Prevent Extreme Instances of Group Size and Covariate Imbalance in Stroke Trials.

Italiano, Dominic; Campbell, Bruce; Hill, Michael D; Johns, Hannah T; Churilov, Leonid.

Stroke ; 55(8): 1962-1972, 2024 Aug.

Article in English | MEDLINE | ID: mdl-38920051

ABSTRACT

BACKGROUND: A recent review of randomization methods used in large multicenter clinical trials within the National Institutes of Health Stroke Trials Network identified preservation of treatment allocation randomness, achievement of the desired group size balance between treatment groups, achievement of baseline covariate balance, and ease of implementation in practice as critical properties required for optimal randomization designs. Common-scale minimal sufficient balance (CS-MSB) adaptive randomization effectively controls for covariate imbalance between treatment groups while preserving allocation randomness but does not balance group sizes. This study extends the CS-MSB adaptive randomization method to achieve both group size and covariate balance while preserving allocation randomness in hyperacute stroke trials. METHODS: A full factorial in silico simulation study evaluated the performance of the proposed new CSSize-MSB adaptive randomization method in achieving group size balance, covariate balance, and allocation randomness compared with the original CS-MSB method. Data from 4 existing hyperacute stroke trials were used to investigate the performance of CSSize-MSB for a range of sample sizes and covariate numbers and types. A discrete-event simulation model created with AnyLogic was used to dynamically visualize the decision logic of the CSSize-MSB randomization process for communication with clinicians. RESULTS: The proposed new CSSize-MSB algorithm uniformly outperformed the CS-MSB algorithm in controlling for group size imbalance while maintaining comparable levels of covariate balance and allocation randomness in hyperacute stroke trials. This improvement was consistent across a distribution of simulated trials with varying levels of imbalance but was increasingly pronounced for trials with extreme cases of imbalance. The results were consistent across a range of trial data sets of different sizes and covariate numbers and types. CONCLUSIONS: The proposed adaptive CSSize-MSB algorithm successfully controls for group size imbalance in hyperacute stroke trials under various settings, and its logic can be readily explained to clinicians using dynamic visualization.

Subject(s)

Stroke , Humans , Sample Size , Randomized Controlled Trials as Topic/methods , Computer Simulation , Random Allocation , Research Design

4.

Effect of scanning duration and sample size on reliability in resting state fMRI dynamic causal modeling analysis.

Ma, Liangsuo; Braun, Sarah E; Steinberg, Joel L; Bjork, James M; Martin, Caitlin E; Keen Ii, Larry D; Moeller, F Gerard.

Neuroimage ; 292: 120604, 2024 Apr 15.

Article in English | MEDLINE | ID: mdl-38604537

ABSTRACT

Despite its widespread use, resting-state functional magnetic resonance imaging (rsfMRI) has been criticized for low test-retest reliability. To improve reliability, researchers have recommended using extended scanning durations, increased sample size, and advanced brain connectivity techniques. However, longer scanning runs and larger sample sizes may come with practical challenges and burdens, especially in rare populations. Here we tested if an advanced brain connectivity technique, dynamic causal modeling (DCM), can improve reliability of fMRI effective connectivity (EC) metrics to acceptable levels without extremely long run durations or extremely large samples. Specifically, we employed DCM for EC analysis on rsfMRI data from the Human Connectome Project. To avoid bias, we assessed four distinct DCMs and gradually increased sample sizes in a randomized manner across ten permutations. We employed pseudo true positive and pseudo false positive rates to assess the efficacy of shorter run durations (3.6, 7.2, 10.8, 14.4 min) in replicating the outcomes of the longest scanning duration (28.8 min) when the sample size was fixed at the largest (n = 160 subjects). Similarly, we assessed the efficacy of smaller sample sizes (n = 10, 20, , 150 subjects) in replicating the outcomes of the largest sample (n = 160 subjects) when the scanning duration was fixed at the longest (28.8 min). Our results revealed that the pseudo false positive rate was below 0.05 for all the analyses. After the scanning duration reached 10.8 min, which yielded a pseudo true positive rate of 92%, further extensions in run time showed no improvements in pseudo true positive rate. Expanding the sample size led to enhanced pseudo true positive rate outcomes, with a plateau at n = 70 subjects for the targeted top one-half of the largest ECs in the reference sample, regardless of whether the longest run duration (28.8 min) or the viable run duration (10.8 min) was employed. Encouragingly, smaller sample sizes exhibited pseudo true positive rates of approximately 80% for n = 20, and 90% for n = 40 subjects. These data suggest that advanced DCM analysis may be a viable option to attain reliable metrics of EC when larger sample sizes or run times are not feasible.

Subject(s)

Brain , Connectome , Magnetic Resonance Imaging , Humans , Magnetic Resonance Imaging/methods , Magnetic Resonance Imaging/standards , Sample Size , Connectome/methods , Connectome/standards , Reproducibility of Results , Brain/diagnostic imaging , Brain/physiology , Adult , Female , Male , Rest/physiology , Time Factors

5.

Triple challenges - Small sample size in both exposure and control groups to scan rare maternal outcomes in a signal identification approach: A simulation study.

Thai, Thuy N; Winterstein, Almut G; Suarez, Elizabeth A; He, Jiwei; Zhao, Yueqin; Zhang, Di; Stojanovic, Danijela; Liedtka, Jane; Anderson, Abby; Hernández-Muñoz, José J; Munoz, Monica; Liu, Wei; Dashevsky, Inna; Messenger-Jones, Elizabeth; Siranosian, Elizabeth; Maro, Judith C.

Am J Epidemiol ; 2024 Jun 24.

Article in English | MEDLINE | ID: mdl-38918039

ABSTRACT

There is a dearth of safety data on maternal outcomes after perinatal medication exposure. Data-mining for unexpected adverse event occurrence in existing datasets is a potentially useful approach. One method, the Poisson tree-based scan statistic (TBSS), assumes that the expected outcome counts, based on incidence of outcomes in the control group, are estimated without error. This assumption may be difficult to satisfy with a small control group. Our simulation study evaluated the effect of imprecise incidence proportions from the control group on TBSS' ability to identify maternal outcomes in pregnancy research. We simulated base case analyses with "true" expected incidence proportions and compared these to imprecise incidence proportions derived from sparse control samples. We varied parameters impacting Type I error and statistical power (exposure group size, outcome's incidence proportion, and effect size). We found that imprecise incidence proportions generated by a small control group resulted in inaccurate alerting, inflation of Type I error, and removal of very rare outcomes for TBSS analysis due to "zero" background counts. Ideally, the control size should be at least several times larger than the exposure size to limit the number of false positive alerts and retain statistical power for true alerts.

6.

Why is a small sample size not enough?

Cao, Ying; Chen, Ronald C; Katz, Aaron J.

Oncologist ; 2024 Jun 27.

Article in English | MEDLINE | ID: mdl-38934301

ABSTRACT

BACKGROUND: Clinical studies are often limited by resources available, which results in constraints on sample size. We use simulated data to illustrate study implications when the sample size is too small. METHODS AND RESULTS: Using 2 theoretical populations each with Nâ=â1000, we randomly sample 10 from each population and conduct a statistical comparison, to help make a conclusion about whether the 2 populations are different. This exercise is repeated for a total of 4 studies: 2 concluded that the 2 populations are statistically significantly different, while 2 showed no statistically significant difference. CONCLUSIONS: Our simulated examples demonstrate that sample sizes play important roles in clinical research. The results and conclusions, in terms of estimates of means, medians, Pearson correlations, chi-square test, and P values, are unreliable with small samples.

7.

Bayesian sample size determination in basket trials borrowing information between subsets.

Zheng, Haiyan; Grayling, Michael J; Mozgunov, Pavel; Jaki, Thomas; Wason, James M S.

Biostatistics ; 24(4): 1000-1016, 2023 10 18.

Article in English | MEDLINE | ID: mdl-35993875

ABSTRACT

Basket trials are increasingly used for the simultaneous evaluation of a new treatment in various patient subgroups under one overarching protocol. We propose a Bayesian approach to sample size determination in basket trials that permit borrowing of information between commensurate subsets. Specifically, we consider a randomized basket trial design where patients are randomly assigned to the new treatment or control within each trial subset ("subtrial" for short). Closed-form sample size formulae are derived to ensure that each subtrial has a specified chance of correctly deciding whether the new treatment is superior to or not better than the control by some clinically relevant difference. Given prespecified levels of pairwise (in)commensurability, the subtrial sample sizes are solved simultaneously. The proposed Bayesian approach resembles the frequentist formulation of the problem in yielding comparable sample sizes for circumstances of no borrowing. When borrowing is enabled between commensurate subtrials, a considerably smaller trial sample size is required compared to the widely implemented approach of no borrowing. We illustrate the use of our sample size formulae with two examples based on real basket trials. A comprehensive simulation study further shows that the proposed methodology can maintain the true positive and false positive rates at desired levels.

Subject(s)

Research Design , Humans , Sample Size , Bayes Theorem , Computer Simulation

8.

No evidence that ACE2 or TMPRSS2 drive population disparity in COVID risks.

Pearson, Nathaniel M; Novembre, John.

BMC Med ; 22(1): 337, 2024 Aug 26.

Article in English | MEDLINE | ID: mdl-39183295

ABSTRACT

Early in the SARS-CoV2 pandemic, in this journal, Hou et al. (BMC Med 18:216, 2020) interpreted public genotype data, run through functional prediction tools, as suggesting that members of particular human populations carry potentially COVID-risk-increasing variants in genes ACE2 and TMPRSS2 far more often than do members of other populations. Beyond resting on predictions rather than clinical outcomes, and focusing on variants too rare to typify population members even jointly, their claim mistook a well known artifact (that large samples reveal more of a population's variants than do small samples) as if showing real and congruent population differences for the two genes, rather than lopsided population sampling in their shared source data. We explain that artifact, and contrast it with empirical findings, now ample, that other loci shape personal COVID risks far more significantly than do ACE2 and TMPRSS2-and that variation in ACE2 and TMPRSS2 per se unlikely exacerbates any net population disparity in the effects of such more risk-informative loci.

Subject(s)

Angiotensin-Converting Enzyme 2 , COVID-19 , SARS-CoV-2 , Serine Endopeptidases , Humans , Angiotensin-Converting Enzyme 2/genetics , Angiotensin-Converting Enzyme 2/metabolism , COVID-19/genetics , COVID-19/epidemiology , Genetic Predisposition to Disease , SARS-CoV-2/genetics , Serine Endopeptidases/genetics

9.

Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data.

Taylor, Sandra; Ponzini, Matthew; Wilson, Machelle; Kim, Kyoungmi.

Brief Bioinform ; 23(1)2022 01 17.

Article in English | MEDLINE | ID: mdl-34472591

ABSTRACT

Missing values are common in high-throughput mass spectrometry data. Two strategies are available to address missing values: (i) eliminate or impute the missing values and apply statistical methods that require complete data and (ii) use statistical methods that specifically account for missing values without imputation (imputation-free methods). This study reviews the effect of sample size and percentage of missing values on statistical inference for multiple methods under these two strategies. With increasing missingness, the ability of imputation and imputation-free methods to identify differentially and non-differentially regulated compounds in a two-group comparison study declined. Random forest and k-nearest neighbor imputation combined with a Wilcoxon test performed well in statistical testing for up to 50% missingness with little bias in estimating the effect size. Quantile regression imputation accompanied with a Wilcoxon test also had good statistical testing outcomes but substantially distorted the difference in means between groups. None of the imputation-free methods performed consistently better for statistical testing than imputation methods.

Subject(s)

Research Design , Bias , Cluster Analysis , Mass Spectrometry/methods

10.

Enhancing discoveries of molecular QTL studies with small sample size using summary statistic imputation.

Wang, Tao; Liu, Yongzhuang; Yin, Quanwei; Geng, Jiaquan; Chen, Jin; Yin, Xipeng; Wang, Yongtian; Shang, Xuequn; Tian, Chunwei; Wang, Yadong; Peng, Jiajie.

Brief Bioinform ; 23(1)2022 01 17.

Article in English | MEDLINE | ID: mdl-34545927

ABSTRACT

Quantitative trait locus (QTL) analyses of multiomic molecular traits, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), have been widely used to infer the functional effects of genome variants. However, the QTL discovery is largely restricted by the limited study sample size, which demands higher threshold of minor allele frequency and then causes heavy missing molecular trait-variant associations. This happens prominently in single-cell level molecular QTL studies because of sample availability and cost. It is urgent to propose a method to solve this problem in order to enhance discoveries of current molecular QTL studies with small sample size. In this study, we presented an efficient computational framework called xQTLImp to impute missing molecular QTL associations. In the local-region imputation, xQTLImp uses multivariate Gaussian model to impute the missing associations by leveraging known association statistics of variants and the linkage disequilibrium (LD) around. In the genome-wide imputation, novel procedures are implemented to improve efficiency, including dynamically constructing a reused LD buffer, adopting multiple heuristic strategies and parallel computing. Experiments on various multiomic bulk and single-cell sequencing-based QTL datasets have demonstrated high imputation accuracy and novel QTL discovery ability of xQTLImp. Finally, a C++ software package is freely available at https://github.com/stormlovetao/QTLIMP.

Subject(s)

Genome-Wide Association Study , Quantitative Trait Loci , Genome-Wide Association Study/methods , Genotype , Linkage Disequilibrium , Phenotype , Polymorphism, Single Nucleotide , Sample Size

11.

Time to Functional Loss as an Endpoint in Huntington's Disease Trials: Enrichment and Sample Size.

Mills, James A; Long, Jeffrey D; Vaidya, Jatin G; Gantman, Emily C; Sathe, Swati; Tabrizi, Sarah J; Sampaio, Cristina.

Mov Disord ; 2024 Aug 05.

Article in English | MEDLINE | ID: mdl-39101272

ABSTRACT

BACKGROUND: Clinical trial scenarios can be modeled using data from observational studies, providing critical information for design of real-world trials. The Huntington's Disease Integrated Staging System (HD-ISS) characterizes disease progression over an individual's lifespan and allows for flexibility in the design of trials with the goal of delaying progression. Enrichment methods can be applied to the HD-ISS to identify subgroups requiring smaller estimated sample sizes. OBJECTIVE: Investigate time to the event of functional decline (HD-ISS Stage 3) as an endpoint for trials in HD and present sample size estimates after enrichment. METHODS: We classified individuals from observational studies according to the HD-ISS. We assessed the ability of the prognostic index normed (PIN) and its components to predict time to HD-ISS Stage 3. For enrichment, we formed groups from deciles of the baseline PIN distribution for HD-ISS Stage 2 participants. We selected enrichment subgroups closer to Stage 3 transition and estimated sample sizes, using delay in the transition time as the effect size. RESULTS: In predicting time to HD-ISS Stage 3, PIN outperforms its components. Survival curves for each PIN decile show that groups with PIN from 1.48 to 2.74 have median time to Stage 3 of approximately 2 years and these are combined to create enrichment subgroups. Sample size estimates are presented by enrichment subgroup. CONCLUSIONS: PIN is predictive of functional decline. A delay of 9 months or more in the transition to Stage 3 for an enriched sample yields feasible sample size estimates, demonstrating that this approach can aid in planning future trials. © 2024 The Author(s). Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.

12.

Scalable Bayesian Divergence Time Estimation With Ratio Transformations.

Ji, Xiang; Fisher, Alexander A; Su, Shuo; Thorne, Jeffrey L; Potter, Barney; Lemey, Philippe; Baele, Guy; Suchard, Marc A.

Syst Biol ; 72(5): 1136-1153, 2023 11 01.

Article in English | MEDLINE | ID: mdl-37458991

ABSTRACT

Divergence time estimation is crucial to provide temporal signals for dating biologically important events from species divergence to viral transmissions in space and time. With the advent of high-throughput sequencing, recent Bayesian phylogenetic studies have analyzed hundreds to thousands of sequences. Such large-scale analyses challenge divergence time reconstruction by requiring inference on highly correlated internal node heights that often become computationally infeasible. To overcome this limitation, we explore a ratio transformation that maps the original $N-1$ internal node heights into a space of one height parameter and $N-2$ ratio parameters. To make the analyses scalable, we develop a collection of linear-time algorithms to compute the gradient and Jacobian-associated terms of the log-likelihood with respect to these ratios. We then apply Hamiltonian Monte Carlo sampling with the ratio transform in a Bayesian framework to learn the divergence times in 4 pathogenic viruses (West Nile virus, rabies virus, Lassa virus, and Ebola virus) and the coralline red algae. Our method both resolves a mixing issue in the West Nile virus example and improves inference efficiency by at least 5-fold for the Lassa and rabies virus examples as well as for the algae example. Our method now also makes it computationally feasible to incorporate mixed-effects molecular clock models for the Ebola virus example, confirms the findings from the original study, and reveals clearer multimodal distributions of the divergence times of some clades of interest.

Subject(s)

Algorithms , Phylogeny , Bayes Theorem , Time Factors , Monte Carlo Method

13.

A rank-based approach to evaluate a surrogate marker in a small sample setting.

Parast, Layla; Cai, Tianxi; Tian, Lu.

Biometrics ; 80(1)2024 Jan 29.

Article in English | MEDLINE | ID: mdl-38386359

ABSTRACT

In clinical studies of chronic diseases, the effectiveness of an intervention is often assessed using "high cost" outcomes that require long-term patient follow-up and/or are invasive to obtain. While much progress has been made in the development of statistical methods to identify surrogate markers, that is, measurements that could replace such costly outcomes, they are generally not applicable to studies with a small sample size. These methods either rely on nonparametric smoothing which requires a relatively large sample size or rely on strict model assumptions that are unlikely to hold in practice and empirically difficult to verify with a small sample size. In this paper, we develop a novel rank-based nonparametric approach to evaluate a surrogate marker in a small sample size setting. The method developed in this paper is motivated by a small study of children with nonalcoholic fatty liver disease (NAFLD), a diagnosis for a range of liver conditions in individuals without significant history of alcohol intake. Specifically, we examine whether change in alanine aminotransferase (ALT; measured in blood) is a surrogate marker for change in NAFLD activity score (obtained by biopsy) in a trial, which compared Vitamin E ($n=50$) versus placebo ($n=46$) among children with NAFLD.

Subject(s)

Non-alcoholic Fatty Liver Disease , Child , Humans , Non-alcoholic Fatty Liver Disease/diagnosis , Biomarkers , Biopsy , Sample Size

14.

Determining sample size in a personalized randomized controlled (PRACTical) trial.

Turner, Rebecca M; Lee, Kim May; Walker, A Sarah; Ellis, Sally; Sharland, Michael; Bielicki, Julia A; Stöhr, Wolfgang; White, Ian R.

Stat Med ; 43(21): 4098-4112, 2024 Sep 20.

Article in English | MEDLINE | ID: mdl-38980954

ABSTRACT

In clinical settings with no commonly accepted standard-of-care, multiple treatment regimens are potentially useful, but some treatments may not be appropriate for some patients. A personalized randomized controlled trial (PRACTical) design has been proposed for this setting. For a network of treatments, each patient is randomized only among treatments which are appropriate for them. The aim is to produce treatment rankings that can inform clinical decisions about treatment choices for individual patients. Here we propose methods for determining sample size in a PRACTical design, since standard power-based methods are not applicable. We derive a sample size by evaluating information gained from trials of varying sizes. For a binary outcome, we quantify how many adverse outcomes would be prevented by choosing the top-ranked treatment for each patient based on trial results rather than choosing a random treatment from the appropriate personalized randomization list. In simulations, we evaluate three performance measures: mean reduction in adverse outcomes using sample information, proportion of simulated patients for whom the top-ranked treatment performed as well or almost as well as the best appropriate treatment, and proportion of simulated trials in which the top-ranked treatment performed better than a randomly chosen treatment. We apply the methods to a trial evaluating eight different combination antibiotic regimens for neonatal sepsis (NeoSep1), in which a PRACTical design addresses varying patterns of antibiotic choice based on disease characteristics and resistance. Our proposed approach produces results that are more relevant to complex decision making by clinicians and policy makers.

Subject(s)

Precision Medicine , Randomized Controlled Trials as Topic , Humans , Randomized Controlled Trials as Topic/methods , Sample Size , Precision Medicine/methods , Computer Simulation , Infant, Newborn , Sepsis/drug therapy , Models, Statistical

15.

A seamless phase II/III design with dose optimization for oncology drug development.

Li, Yuhan; Zhang, Yiding; Mi, Gu; Lin, Ji.

Stat Med ; 43(18): 3383-3402, 2024 Aug 15.

Article in English | MEDLINE | ID: mdl-38845095

ABSTRACT

The US FDA's Project Optimus initiative that emphasizes dose optimization prior to marketing approval represents a pivotal shift in oncology drug development. It has a ripple effect for rethinking what changes may be made to conventional pivotal trial designs to incorporate a dose optimization component. Aligned with this initiative, we propose a novel seamless phase II/III design with dose optimization (SDDO framework). The proposed design starts with dose optimization in a randomized setting, leading to an interim analysis focused on optimal dose selection, trial continuation decisions, and sample size re-estimation (SSR). Based on the decision at interim analysis, patient enrollment continues for both the selected dose arm and control arm, and the significance of treatment effects will be determined at final analysis. The SDDO framework offers increased flexibility and cost-efficiency through sample size adjustment, while stringently controlling the Type I error. This proposed design also facilitates both accelerated approval (AA) and regular approval in a "one-trial" approach. Extensive simulation studies confirm that our design reliably identifies the optimal dosage and makes preferable decisions with a reduced sample size while retaining statistical power.

Subject(s)

Antineoplastic Agents , Clinical Trials, Phase II as Topic , Clinical Trials, Phase III as Topic , Drug Development , Humans , Clinical Trials, Phase II as Topic/methods , Antineoplastic Agents/administration & dosage , Antineoplastic Agents/therapeutic use , Drug Development/methods , Sample Size , Computer Simulation , Dose-Response Relationship, Drug , Research Design , United States , United States Food and Drug Administration , Drug Approval , Randomized Controlled Trials as Topic , Neoplasms/drug therapy

16.

Sample size determination for prediction models via learning-type curves.

Dayimu, Alimu; Simidjievski, Nikola; Demiris, Nikolaos; Abraham, Jean.

Stat Med ; 43(16): 3062-3072, 2024 Jul 20.

Article in English | MEDLINE | ID: mdl-38803150

ABSTRACT

This article is concerned with sample size determination methodology for prediction models. We propose to combine the individual calculations via learning-type curves. We suggest two distinct ways of doing so, a deterministic skeleton of a learning curve and a Gaussian process centered upon its deterministic counterpart. We employ several learning algorithms for modeling the primary endpoint and distinct measures for trial efficacy. We find that the performance may vary with the sample size, but borrowing information across sample size universally improves the performance of such calculations. The Gaussian process-based learning curve appears more robust and statistically efficient, while computational efficiency is comparable. We suggest that anchoring against historical evidence when extrapolating sample sizes should be adopted when such data are available. The methods are illustrated on binary and survival endpoints.

Subject(s)

Algorithms , Models, Statistical , Humans , Sample Size , Learning Curve , Normal Distribution , Computer Simulation , Survival Analysis

17.

Sample size calculation for comparing two screening tests when the gold standard is missing at random.

Wu, Yougui.

Stat Med ; 43(15): 2944-2956, 2024 Jul 10.

Article in English | MEDLINE | ID: mdl-38747112

ABSTRACT

Sample size formulas have been proposed for comparing two sensitivities (specificities) in the presence of verification bias under a paired design. However, the existing sample size formulas involve lengthy calculations of derivatives and are too complicated to implement. In this paper, we propose alternative sample size formulas for each of three existing tests, two Wald tests and one weighted McNemar's test. The proposed sample size formulas are more intuitive and simpler to implement than their existing counterparts. Furthermore, by comparing the sample sizes calculated based on the three tests, we can show that the three tests have similar sample sizes even though the weighted McNemar's test only use the data from discordant pairs whereas the two Wald tests also use the additional data from accordant pairs.

Subject(s)

Sensitivity and Specificity , Sample Size , Humans , Models, Statistical , Bias , Computer Simulation

18.

On the distribution of the power function for the scale parameter of exponential families.

De Santis, Fulvio; Gubbiotti, Stefania.

Stat Med ; 43(10): 1973-1992, 2024 May 10.

Article in English | MEDLINE | ID: mdl-38634314

ABSTRACT

The expected value of the standard power function of a test, computed with respect to a design prior distribution, is often used to evaluate the probability of success of an experiment. However, looking only at the expected value might be reductive. Instead, the whole probability distribution of the power function induced by the design prior can be exploited. In this article we consider one-sided testing for the scale parameter of exponential families and we derive general unifying expressions for cumulative distribution and density functions of the random power. Sample size determination criteria based on alternative summaries of these functions are discussed. The study sheds light on the relevance of the choice of the design prior in order to construct a successful experiment.

Subject(s)

Bayes Theorem , Humans , Probability , Sample Size

19.

Designing individually randomized group treatment trials with repeated outcome measurements using generalized estimating equations.

Wang, Xueqi; Turner, Elizabeth L; Li, Fan.

Stat Med ; 43(2): 358-378, 2024 01 30.

Article in English | MEDLINE | ID: mdl-38009329

ABSTRACT

Individually randomized group treatment (IRGT) trials, in which the clustering of outcome is induced by group-based treatment delivery, are increasingly popular in public health research. IRGT trials frequently incorporate longitudinal measurements, of which the proper sample size calculations should account for correlation structures reflecting both the treatment-induced clustering and repeated outcome measurements. Given the relatively sparse literature on designing longitudinal IRGT trials, we propose sample size procedures for continuous and binary outcomes based on the generalized estimating equations approach, employing the block exchangeable correlation structures with different correlation parameters for the treatment arm and for the control arm, and surveying five marginal mean models with different assumptions of time effect: no-time constant treatment effect, linear-time constant treatment effect, categorical-time constant treatment effect, linear time by treatment interaction, and categorical time by treatment interaction. Closed-form sample size formulas are derived for continuous outcomes, which depends on the eigenvalues of the correlation matrices; detailed numerical sample size procedures are proposed for binary outcomes. Through simulations, we demonstrate that the empirical power agrees well with the predicted power, for as few as eight groups formed in the treatment arm, when data are analyzed using the matrix-adjusted estimating equations for the correlation parameters with a bias-corrected sandwich variance estimator.

Subject(s)

Models, Statistical , Research Design , Humans , Sample Size , Bias , Cluster Analysis , Computer Simulation

20.

Sample size adaptation designs and efficiency comparison with group sequential designs.

Cui, Lu.

Stat Med ; 43(11): 2203-2215, 2024 May 20.

Article in English | MEDLINE | ID: mdl-38545849

ABSTRACT

This study is to give a systematic account of sample size adaptation designs (SSADs) and to provide direct proof of the efficiency advantage of general SSADs over group sequential designs (GSDs) from a different perspective. For this purpose, a class of sample size mapping functions to define SSADs is introduced. Under the two-stage adaptive clinical trial setting, theorems are developed to describe the properties of SSADs. Sufficient conditions are derived and used to prove analytically that SSADs based on the weighted combination test can be uniformly more efficient than GSDs in a range of likely values of the true treatment difference Î´ $$ \delta $$ . As shown in various scenarios, given a GSD, a fully adaptive SSAD can be obtained that has sufficient statistical power similar to that of the GSD but has a smaller average sample size for all Î´ $$ \delta $$ in the range. The associated sample size savings can be substantial. A practical design example and suggestions on the steps to find efficient SSADs are also provided.

Subject(s)

Research Design , Sample Size , Humans , Models, Statistical , Adaptive Clinical Trials as Topic/statistics & numerical data , Adaptive Clinical Trials as Topic/methods , Computer Simulation , Clinical Trials as Topic/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL