Search | VHL Regional Portal

1.

Sample size calculation for data reliability and diagnostic performance: a go-to review.

Monti, Caterina Beatrice; Ambrogi, Federico; Sardanelli, Francesco.

Eur Radiol Exp ; 8(1): 79, 2024 Jul 05.

Article in English | MEDLINE | ID: mdl-38965128

ABSTRACT

Sample size, namely the number of subjects that should be included in a study to reach the desired endpoint and statistical power, is a fundamental concept of scientific research. Indeed, sample size must be planned a priori, and tailored to the main endpoint of the study, to avoid including too many subjects, thus possibly exposing them to additional risks while also wasting time and resources, or too few subjects, failing to reach the desired purpose. We offer a simple, go-to review of methods for sample size calculation for studies concerning data reliability (repeatability/reproducibility) and diagnostic performance. For studies concerning data reliability, we considered Cohen's κ or intraclass correlation coefficient (ICC) for hypothesis testing, estimation of Cohen's κ or ICC, and Bland-Altman analyses. With regards to diagnostic performance, we considered accuracy or sensitivity/specificity versus reference standards, the comparison of diagnostic performances, and the comparisons of areas under the receiver operating characteristics curve. Finally, we considered the special cases of dropouts or retrospective case exclusions, multiple endpoints, lack of prior data estimates, and the selection of unusual thresholds for α and ß errors. For the most frequent cases, we provide example of software freely available on the Internet.Relevance statement Sample size calculation is a fundamental factor influencing the quality of studies on repeatability/reproducibility and diagnostic performance in radiology.Key pointsâ¢ Sample size is a concept related to precision and statistical power.â¢ It has ethical implications, especially when patients are exposed to risks.â¢ Sample size should always be calculated before starting a study.â¢ This review offers simple, go-to methods for sample size calculations.

Subject(s)

Research Design , Sample Size , Humans , Reproducibility of Results

2.

An evaluation of sample size requirements for developing risk prediction models with binary outcomes.

Pavlou, Menelaos; Ambler, Gareth; Qu, Chen; Seaman, Shaun R; White, Ian R; Omar, Rumana Z.

BMC Med Res Methodol ; 24(1): 146, 2024 Jul 10.

Article in English | MEDLINE | ID: mdl-38987715

ABSTRACT

BACKGROUND: Risk prediction models are routinely used to assist in clinical decision making. A small sample size for model development can compromise model performance when the model is applied to new patients. For binary outcomes, the calibration slope (CS) and the mean absolute prediction error (MAPE) are two key measures on which sample size calculations for the development of risk models have been based. CS quantifies the degree of model overfitting while MAPE assesses the accuracy of individual predictions. METHODS: Recently, two formulae were proposed to calculate the sample size required, given anticipated features of the development data such as the outcome prevalence and c-statistic, to ensure that the expectation of the CS and MAPE (over repeated samples) in models fitted using MLE will meet prespecified target values. In this article, we use a simulation study to evaluate the performance of these formulae. RESULTS: We found that both formulae work reasonably well when the anticipated model strength is not too high (c-statistic < 0.8), regardless of the outcome prevalence. However, for higher model strengths the CS formula underestimates the sample size substantially. For example, for c-statistic = 0.85 and 0.9, the sample size needed to be increased by at least 50% and 100%, respectively, to meet the target expected CS. On the other hand, the MAPE formula tends to overestimate the sample size for high model strengths. These conclusions were more pronounced for higher prevalence than for lower prevalence. Similar results were drawn when the outcome was time to event with censoring. Given these findings, we propose a simulation-based approach, implemented in the new R package 'samplesizedev', to correctly estimate the sample size even for high model strengths. The software can also calculate the variability in CS and MAPE, thus allowing for assessment of model stability. CONCLUSIONS: The calibration and MAPE formulae suggest sample sizes that are generally appropriate for use when the model strength is not too high. However, they tend to be biased for higher model strengths, which are not uncommon in clinical risk prediction studies. On those occasions, our proposed adjustments to the sample size calculations will be relevant.

Subject(s)

Models, Statistical , Humans , Sample Size , Risk Assessment/methods , Risk Assessment/statistics & numerical data , Computer Simulation , Algorithms

3.

How many Monte Carlo samples are needed for probabilistic cost-effectiveness analyses?

Yaesoubi, Reza.

Value Health ; 2024 Jul 06.

Article in English | MEDLINE | ID: mdl-38977192

ABSTRACT

OBJECTIVE: Probabilistic sensitivity analysis (PSA) is conducted to account for the uncertainty in cost and effect of decision options under consideration. PSA involves obtaining a large sample of input parameter values (N) to estimate the expected cost and effect of each alternative in the presence of parameter uncertainty. When the analysis involves using stochastic models (e.g., individual-level models), the model is further replicated P times for each sampled parameter set. We study how N and P should be determined. METHODS: We show that PSA could be structured such that P can be an arbitrary number (say, P=1). To determine N, we derive a formula based on Chebyshev's inequality such that the error in estimating the incremental cost-effectiveness ratio (ICER) of alternatives (or equivalently, the willingness-to-pay value at which the optimal decision option changes) is within a desired level of accuracy. We described two methods to confirmed, visually and quantitatively, that the N informed by this method results in ICER estimates within the specified level of accuracy. RESULTS: When N is arbitrarily selected, the estimated ICERs could be substantially different from the true ICER (even as P increases), which could lead misleading conclusions. Using a simple resource allocation model, we demonstrate that the proposed approach can minimize the potential for this error. CONCLUSIONS: The number of parameter samples in probabilistic CEAs should not be arbitrarily selected. We describe three methods to ensure that enough parameter samples are used in probabilistic CEAs.

4.

Primary outcomes and anticipated effect sizes in randomised clinical trials assessing adjuncts to peripheral nerve blocks: A scoping review.

Flyger, Sarah Sofie Bitsch; Sorenson, Sandra; Pingel, Lasse; Karlsen, Anders Peder Højer; Nørskov, Anders Kehlet; Mathiesen, Ole; Maagaard, Mathias.

Acta Anaesthesiol Scand ; 2024 Jul 08.

Article in English | MEDLINE | ID: mdl-38978187

ABSTRACT

BACKGROUND: Prolonging effects of adjuncts to local anaesthetics in peripheral nerve blocks have been demonstrated in randomised clinical trials. The chosen primary outcome and anticipated effect size have major impact on the clinical relevance of results in these trials. This scoping review aims to provide an overview of frequently used outcomes and anticipated effect sizes in randomised trials on peripheral nerve block adjuncts. METHODS: For our scoping review, we searched MEDLINE, Embase and CENTRAL for trials assessing effects of adjuncts for peripheral nerve blocks published in 10 major anaesthesia journals. We included randomised clinical trials assessing adjuncts for single-shot ultrasound-guided peripheral nerve blocks, regardless of the type of interventional adjunct and control group, local anaesthetic used and anatomical localization. Our primary outcome was the choice of primary outcomes and corresponding anticipated effect size used for sample size estimation. Secondary outcomes were assessor of primary outcomes, the reporting of sample size calculations and statistically significant and non-significant results related to the anticipated effect sizes. RESULTS: Of 11,854 screened trials, we included 59. The most frequent primary outcome was duration of analgesia (35/59 trials, 59%) with absolute and relative median (interquartile range) anticipated effect sizes for adjunct versus placebo/no adjunct: 240 min (180-318) and 30% (25-40) and for adjunct versus active comparator: 210 min (180-308) and 17% (15-28). Adequate sample size calculations were reported in 78% of trials. Statistically significant results were reported for primary outcomes in 45/59 trials (76%), of which 22% did not reach the anticipated effect size. CONCLUSION: The reported outcomes and associated anticipated effect sizes can be used in future trials on adjuncts for peripheral nerve blocks to increase methodological homogeneity.

5.

The score-goldilocks design for phase 3 clinical trials.

Li, Yingqiu; Zhang, Xun; Weng, Zhimao.

J Biopharm Stat ; : 1-10, 2024 Jul 12.

Article in English | MEDLINE | ID: mdl-39001557

ABSTRACT

In this paper, we propose a new Bayesian adaptive design, score-goldilocks design, which has the same algorithmic idea as goldilocks design. The score-goldilocks design leads to a uniform formula for calculating the probability of trial success for different endpoint trials by using the normal approximation. The simulation results show that the score-goldilocks design is not only very similar to the goldilocks design in terms of operating characteristics such as type 1 error, power, average sample size, probability of stop for futility, and probability of early stop for success, but also greatly saves the calculation time and improves the operation efficiency.

6.

Sample Size Calculation for an Individual Stepped-Wedge Randomized Trial.

Allemang-Trivalle, Aude; Maruani, Annabel; Giraudeau, Bruno.

Biom J ; 66(5): e202300167, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38988194

ABSTRACT

In the individual stepped-wedge randomized trial (ISW-RT), subjects are allocated to sequences, each sequence being defined by a control period followed by an experimental period. The total follow-up time is the same for all sequences, but the duration of the control and experimental periods varies among sequences. To our knowledge, there is no validated sample size calculation formula for ISW-RTs unlike stepped-wedge cluster randomized trials (SW-CRTs). The objective of this study was to adapt the formula used for SW-CRTs to the case of individual randomization and to validate this adaptation using a Monte Carlo simulation study. The proposed sample size calculation formula for an ISW-RT design yielded satisfactory empirical power for most scenarios except scenarios with operating characteristic values near the boundary (i.e., smallest possible number of periods, very high or very low autocorrelation coefficient). Overall, the results provide useful insights into the sample size calculation for ISW-RTs.

Subject(s)

Monte Carlo Method , Randomized Controlled Trials as Topic , Sample Size , Humans , Biometry/methods

7.

Determining an Appropriate Sample Size for Qualitative Interviews to Achieve True and Near Code Saturation: Secondary Analysis of Data.

Squire, Claudia M; Giombi, Kristen C; Rupert, Douglas J; Amoozegar, Jacqueline; Williams, Peyton.

J Med Internet Res ; 26: e52998, 2024 Jul 09.

Article in English | MEDLINE | ID: mdl-38980711

ABSTRACT

BACKGROUND: In-depth interviews are a common method of qualitative data collection, providing rich data on individuals' perceptions and behaviors that would be challenging to collect with quantitative methods. Researchers typically need to decide on sample size a priori. Although studies have assessed when saturation has been achieved, there is no agreement on the minimum number of interviews needed to achieve saturation. To date, most research on saturation has been based on in-person data collection. During the COVID-19 pandemic, web-based data collection became increasingly common, as traditional in-person data collection was possible. Researchers continue to use web-based data collection methods post the COVID-19 emergency, making it important to assess whether findings around saturation differ for in-person versus web-based interviews. OBJECTIVE: We aimed to identify the number of web-based interviews needed to achieve true code saturation or near code saturation. METHODS: The analyses for this study were based on data from 5 Food and Drug Administration-funded studies conducted through web-based platforms with patients with underlying medical conditions or with health care providers who provide primary or specialty care to patients. We extracted code- and interview-specific data and examined the data summaries to determine when true saturation or near saturation was reached. RESULTS: The sample size used in the 5 studies ranged from 30 to 70 interviews. True saturation was reached after 91% to 100% (n=30-67) of planned interviews, whereas near saturation was reached after 33% to 60% (n=15-23) of planned interviews. Studies that relied heavily on deductive coding and studies that had a more structured interview guide reached both true saturation and near saturation sooner. We also examined the types of codes applied after near saturation had been reached. In 4 of the 5 studies, most of these codes represented previously established core concepts or themes. Codes representing newly identified concepts, other or miscellaneous responses (eg, "in general"), uncertainty or confusion (eg, "don't know"), or categorization for analysis (eg, correct as compared with incorrect) were less commonly applied after near saturation had been reached. CONCLUSIONS: This study provides support that near saturation may be a sufficient measure to target and that conducting additional interviews after that point may result in diminishing returns. Factors to consider in determining how many interviews to conduct include the structure and type of questions included in the interview guide, the coding structure, and the population under study. Studies with less structured interview guides, studies that rely heavily on inductive coding and analytic techniques, and studies that include populations that may be less knowledgeable about the topics discussed may require a larger sample size to reach an acceptable level of saturation. Our findings also build on previous studies looking at saturation for in-person data collection conducted at a small number of sites.

Subject(s)

COVID-19 , Interviews as Topic , Humans , Sample Size , Interviews as Topic/methods , Qualitative Research , SARS-CoV-2 , Pandemics , Data Collection/methods , Internet

8.

Determining sample size in a personalized randomized controlled (PRACTical) trial.

Turner, Rebecca M; Lee, Kim May; Walker, A Sarah; Ellis, Sally; Sharland, Michael; Bielicki, Julia A; Stöhr, Wolfgang; White, Ian R.

Stat Med ; 2024 Jul 09.

Article in English | MEDLINE | ID: mdl-38980954

ABSTRACT

In clinical settings with no commonly accepted standard-of-care, multiple treatment regimens are potentially useful, but some treatments may not be appropriate for some patients. A personalized randomized controlled trial (PRACTical) design has been proposed for this setting. For a network of treatments, each patient is randomized only among treatments which are appropriate for them. The aim is to produce treatment rankings that can inform clinical decisions about treatment choices for individual patients. Here we propose methods for determining sample size in a PRACTical design, since standard power-based methods are not applicable. We derive a sample size by evaluating information gained from trials of varying sizes. For a binary outcome, we quantify how many adverse outcomes would be prevented by choosing the top-ranked treatment for each patient based on trial results rather than choosing a random treatment from the appropriate personalized randomization list. In simulations, we evaluate three performance measures: mean reduction in adverse outcomes using sample information, proportion of simulated patients for whom the top-ranked treatment performed as well or almost as well as the best appropriate treatment, and proportion of simulated trials in which the top-ranked treatment performed better than a randomly chosen treatment. We apply the methods to a trial evaluating eight different combination antibiotic regimens for neonatal sepsis (NeoSep1), in which a PRACTical design addresses varying patterns of antibiotic choice based on disease characteristics and resistance. Our proposed approach produces results that are more relevant to complex decision making by clinicians and policy makers.

9.

Group sequential design for time-to-event outcome with non-proportional hazards using the concept of relative time utilizing two different Weibull distributions.

Phadnis, Milind A; Thewarapperuma, Nadeesha; Mayo, Matthew S.

Contemp Clin Trials Commun ; 40: 101315, 2024 Aug.

Article in English | MEDLINE | ID: mdl-39036558

ABSTRACT

A group sequential design allows investigators to sequentially monitor efficacy and safety as part of interim testing in phase III trials. Literature is well developed in the case of continuous and binary outcomes, however, in case of trials with a time-to-event outcome, popular methods of sample size calculation often assume proportional hazards. In situations where the proportional hazards assumption is inappropriate as indicated by historical data, these popular methods are very restrictive. In this paper, a novel simulation-based group sequential design is proposed for a two-arm randomized phase III clinical trial with a survival endpoint for the non-proportional hazards scenario. By assuming that the survival times for each treatment arm follow two different Weibull distributions, the proposed method utilizes the concept of Relative Time to calculate the efficacy and safety boundaries at selected interim testing points. The test statistic used to generate these boundaries is asymptotically normal, allowing p-value calculation at each boundary. Many design features specific to time-to-event data can be incorporated with ease. Additionally, the proposed method allows the flexibility of having the accelerated failure time model and the proportional hazards model as constrained special cases. Real life applications are discussed demonstrating the practicality of the proposed method.

10.

Global Perfusion Practice Survey: Readiness of On-Call and Emergency Operation Rooms.

Butt, Salman Pervaiz; Saleem, Yasir; Raposo, Nuno; Darr, Umer; Bhatnagar, Gopal.

Braz J Cardiovasc Surg ; 39(4): e20230236, 2024 Jul 22.

Article in English | MEDLINE | ID: mdl-39038115

ABSTRACT

INTRODUCTION: Perfusion safety in cardiac surgery is vital, and this survey explores perfusion practices, perspectives, and challenges related to it. Specifically, it examines the readiness of on-call and emergency operation rooms for perfusion-related procedures during urgent situations. The aim is to identify gaps and enhance perfusion safety protocols, ultimately improving patient care. METHODS: This was a preliminary survey conducted as an initial exploration before committing to a comprehensive study. The sample size was primarily determined based on a one-month time frame. The survey collected data from 236 healthcare professionals, including cardiac surgeons, perfusionists, and anesthetists, using an online platform. Ethical considerations ensured participant anonymity and voluntary participation. The survey comprised multiple-choice and open-ended questions to gather quantitative and qualitative data. RESULTS: The survey found that 53% preferred a dry circuit ready for emergencies, 19.9% preferred primed circuits, and 19.1% chose not to have a ready pump at all. Various reasons influenced these choices, including caseload variations, response times, historical practices, surgeon preferences, and backup perfusionist availability. Infection risk, concerns about error, and team dynamics were additional factors affecting circuit readiness. CONCLUSION: This survey sheds light on current perfusion practices and challenges, emphasizing the importance of standardized protocols in regards to readiness of on-call and emergency operation rooms. It provides valuable insights for advancing perfusion safety and patient care while contributing to the existing literature on the subject.

Subject(s)

Operating Rooms , Humans , Surveys and Questionnaires , Perfusion/methods , Cardiac Surgical Procedures , Patient Safety , Emergency Service, Hospital/organization & administration

11.

Sample Size Determination and Study Design Impact on Dose-Scale Pharmacodynamic Bioequivalence: a Case Study Using Orlistat.

Xu, Lian; Li, Sanwang; Wu, Wei; Cheng, Zeneng; Xie, Feifan.

AAPS J ; 26(4): 77, 2024 Jul 03.

Article in English | MEDLINE | ID: mdl-38960976

ABSTRACT

Dose-scale pharmacodynamic bioequivalence is recommended for evaluating the consistency of generic and innovator formulations of certain locally acting drugs, such as orlistat. This study aimed to investigate the standard methodology for sample size determination and the impact of study design on dose-scale pharmacodynamic bioequivalence using orlistat as the model drug. A population pharmacodynamic model of orlistat was developed using NONMEM 7.5.1 and utilized for subsequent simulations. Three different study designs were evaluated across various predefined relative bioavailability ratios of test/reference (T/R) formulations. These designs included Study Design 1 (2×1 crossover with T1 60 mg, R1 60 mg, and R2 120 mg), Study Design 2 (2×1 crossover with T2 120 mg, R1 60 mg, and R2 120 mg), and Study Design 3 (2×2 crossover with T1 60 mg, T2 120 mg, R1 60 mg, and R2 120 mg). Sample sizes were determined using a stochastic simulation and estimation approach. Under the same T/R ratio and power, Study Design 3 required the minimum sample size for bioequivalence, followed by Study Design 1, while Study Design 2 performed the worst. For Study Designs 1 and 3, a larger sample size was needed on the T/R ratio < 1.0 side for the same power compared to that on the T/R ratio > 1.0 side. The opposite asymmetry was observed for Study Design 2. We demonstrated that Study Design 3 is most effective for reducing the sample size for orlistat bioequivalence studies, and the impact of T/R ratio on sample size shows asymmetry.

Subject(s)

Cross-Over Studies , Orlistat , Therapeutic Equivalency , Orlistat/pharmacokinetics , Orlistat/administration & dosage , Humans , Sample Size , Research Design , Biological Availability , Models, Biological , Anti-Obesity Agents/pharmacokinetics , Anti-Obesity Agents/administration & dosage , Lactones/pharmacokinetics , Lactones/administration & dosage , Computer Simulation , Dose-Response Relationship, Drug

12.

Hypothesis testing and sample size considerations for the test-negative design.

Huo, Yanan; Yang, Yang; Halloran, M Elizabeth; Longini, Ira M; Dean, Natalie E.

BMC Med Res Methodol ; 24(1): 151, 2024 Jul 16.

Article in English | MEDLINE | ID: mdl-39014324

ABSTRACT

The test-negative design (TND) is an observational study design to evaluate vaccine effectiveness (VE) that enrolls individuals receiving diagnostic testing for a target disease as part of routine care. VE is estimated as one minus the adjusted odds ratio of testing positive versus negative comparing vaccinated and unvaccinated patients. Although the TND is related to case-control studies, it is distinct in that the ratio of test-positive cases to test-negative controls is not typically pre-specified. For both types of studies, sparse cells are common when vaccines are highly effective. We consider the implications of these features on power for the TND. We use simulation studies to explore three hypothesis-testing procedures and associated sample size calculations for case-control and TND studies. These tests, all based on a simple logistic regression model, are a standard Wald test, a continuity-corrected Wald test, and a score test. The Wald test performs poorly in both case-control and TND when VE is high because the number of vaccinated test-positive cases can be low or zero. Continuity corrections help to stabilize the variance but induce bias. We observe superior performance with the score test as the variance is pooled under the null hypothesis of no group differences. We recommend using a score-based approach to design and analyze both case-control and TND. We propose a modification to the TND score sample size to account for additional variability in the ratio of controls over cases. This work enhances our understanding of the data generating mechanism in a test-negative design (TND) and how it is distinct from that of a case-control study due to its passive recruitment of controls.

Subject(s)

Research Design , Humans , Sample Size , Case-Control Studies , Vaccine Efficacy/statistics & numerical data , Logistic Models , Computer Simulation , Odds Ratio , Vaccination/statistics & numerical data , Observational Studies as Topic/methods , Observational Studies as Topic/statistics & numerical data

13.

Sample Size Estimation Using a Partially Clustered Frailty Model for Biomarker-Strategy Designs With Multiple Treatments.

Dinart, Derek; Rondeau, Virginie; Bellera, Carine.

Pharm Stat ; 2024 Jul 16.

Article in English | MEDLINE | ID: mdl-39014905

ABSTRACT

Biomarker-guided therapy is a growing area of research in medicine. To optimize the use of biomarkers, several study designs including the biomarker-strategy design (BSD) have been proposed. Unlike traditional designs, the emphasis here is on comparing treatment strategies and not on treatment molecules as such. Patients are assigned to either a biomarker-based strategy (BBS) arm, in which biomarker-positive patients receive an experimental treatment that targets the identified biomarker, or a non-biomarker-based strategy (NBBS) arm, in which patients receive treatment regardless of their biomarker status. We proposed a simulation method based on a partially clustered frailty model (PCFM) as well as an extension of Freidlin formula to estimate the sample size required for BSD with multiple targeted treatments. The sample size was mainly influenced by the heterogeneity of treatment effect, the proportion of biomarker-negative patients, and the randomization ratio. The PCFM is well suited for the data structure and offers an alternative to traditional methodologies.

14.

Strategy for Designing In Vivo Dose-Response Comparison Studies.

Novick, Steven; Zhang, Tianhui.

Pharm Stat ; 2024 Jul 17.

Article in English | MEDLINE | ID: mdl-39015015

ABSTRACT

In preclinical drug discovery, at the step of lead optimization of a compound, in vivo experimentation can differentiate several compounds in terms of efficacy and potency in a biological system of whole living organisms. For the lead optimization study, it may be desirable to implement a dose-response design so that compound comparisons can be made from nonlinear curves fitted to the data. A dose-response design requires more thought relative to a simpler study design, needing parameters for the number of doses, the dose values, and the sample size per dose. This tutorial illustrates how to calculate statistical power, choose doses, and determine sample size per dose for a comparison of two or more dose-response curves for a future in vivo study.

15.

Why is a small sample size not enough?

Cao, Ying; Chen, Ronald C; Katz, Aaron J.

Oncologist ; 2024 Jun 27.

Article in English | MEDLINE | ID: mdl-38934301

ABSTRACT

BACKGROUND: Clinical studies are often limited by resources available, which results in constraints on sample size. We use simulated data to illustrate study implications when the sample size is too small. METHODS AND RESULTS: Using 2 theoretical populations each with Nâ=â1000, we randomly sample 10 from each population and conduct a statistical comparison, to help make a conclusion about whether the 2 populations are different. This exercise is repeated for a total of 4 studies: 2 concluded that the 2 populations are statistically significantly different, while 2 showed no statistically significant difference. CONCLUSIONS: Our simulated examples demonstrate that sample sizes play important roles in clinical research. The results and conclusions, in terms of estimates of means, medians, Pearson correlations, chi-square test, and P values, are unreliable with small samples.

16.

Simplified Sample Size Formulas for Detecting a Medically Important Effect.

Indrayan, Abhaya; Mishra, Aman; Bhaskarapillai, Binukumar.

Indian J Community Med ; 49(3): 464-471, 2024.

Article in English | MEDLINE | ID: mdl-38933799

ABSTRACT

The sample size is just about the most common question in the minds of many medical researchers. This size determines the reliability of the results and helps to detect a medically important effect when present. Some studies miss an important effect due to inappropriate sample size. Many postgraduate students and established researchers often contact a statistician to help them determine an appropriate sample size for their study. More than 80 formulas are available to calculate sample size for different settings and the choice requires some expertise. Their use is even more difficult because most exact formulas are quite complex. An added difficulty is that different books, software, and websites use different formulas for the same problem. Such discrepancy in the published formulas confounds a biostatistician also. The objective of this communication is to present uniformly looking formulas for many situations together at one place in their simple but correct form, along with the setting where they are applicable. This will help in choosing an appropriate formula for the kind of research one is proposing to do and use it with confidence. This communication is restricted to the sample size required to detect a medically important effect when present - known to the statisticians as the test of hypothesis situation. Such a collection is not available anywhere, not even in any book. The sample size formulas for estimation are different and not discussed here.

17.

A seamless phase II/III design with dose optimization for oncology drug development.

Li, Yuhan; Zhang, Yiding; Mi, Gu; Lin, Ji.

Stat Med ; 43(18): 3383-3402, 2024 Aug 15.

Article in English | MEDLINE | ID: mdl-38845095

ABSTRACT

The US FDA's Project Optimus initiative that emphasizes dose optimization prior to marketing approval represents a pivotal shift in oncology drug development. It has a ripple effect for rethinking what changes may be made to conventional pivotal trial designs to incorporate a dose optimization component. Aligned with this initiative, we propose a novel seamless phase II/III design with dose optimization (SDDO framework). The proposed design starts with dose optimization in a randomized setting, leading to an interim analysis focused on optimal dose selection, trial continuation decisions, and sample size re-estimation (SSR). Based on the decision at interim analysis, patient enrollment continues for both the selected dose arm and control arm, and the significance of treatment effects will be determined at final analysis. The SDDO framework offers increased flexibility and cost-efficiency through sample size adjustment, while stringently controlling the Type I error. This proposed design also facilitates both accelerated approval (AA) and regular approval in a "one-trial" approach. Extensive simulation studies confirm that our design reliably identifies the optimal dosage and makes preferable decisions with a reduced sample size while retaining statistical power.

Subject(s)

Antineoplastic Agents , Clinical Trials, Phase II as Topic , Clinical Trials, Phase III as Topic , Drug Development , Humans , Clinical Trials, Phase II as Topic/methods , Antineoplastic Agents/administration & dosage , Antineoplastic Agents/therapeutic use , Drug Development/methods , Sample Size , Computer Simulation , Dose-Response Relationship, Drug , Research Design , United States , United States Food and Drug Administration , Drug Approval , Randomized Controlled Trials as Topic , Neoplasms/drug therapy

18.

Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.

Majdik, Zoltan P; Graham, S Scott; Shiva Edward, Jade C; Rodriguez, Sabrina N; Karnes, Martha S; Jensen, Jared T; Barbour, Joshua B; Rousseau, Justin F.

JMIR AI ; 3: e52095, 2024 May 16.

Article in English | MEDLINE | ID: mdl-38875593

ABSTRACT

BACKGROUND: Large language models (LLMs) have the potential to support promising new applications in health informatics. However, practical data on sample size considerations for fine-tuning LLMs to perform specific tasks in biomedical and health policy contexts are lacking. OBJECTIVE: This study aims to evaluate sample size and sample selection techniques for fine-tuning LLMs to support improved named entity recognition (NER) for a custom data set of conflicts of interest disclosure statements. METHODS: A random sample of 200 disclosure statements was prepared for annotation. All "PERSON" and "ORG" entities were identified by each of the 2 raters, and once appropriate agreement was established, the annotators independently annotated an additional 290 disclosure statements. From the 490 annotated documents, 2500 stratified random samples in different size ranges were drawn. The 2500 training set subsamples were used to fine-tune a selection of language models across 2 model architectures (Bidirectional Encoder Representations from Transformers [BERT] and Generative Pre-trained Transformer [GPT]) for improved NER, and multiple regression was used to assess the relationship between sample size (sentences), entity density (entities per sentence [EPS]), and trained model performance (F1-score). Additionally, single-predictor threshold regression models were used to evaluate the possibility of diminishing marginal returns from increased sample size or entity density. RESULTS: Fine-tuned models ranged in topline NER performance from F1-score=0.79 to F1-score=0.96 across architectures. Two-predictor multiple linear regression models were statistically significant with multiple R2 ranging from 0.6057 to 0.7896 (all P<.001). EPS and the number of sentences were significant predictors of F1-scores in all cases ( P<.001), except for the GPT-2_large model, where EPS was not a significant predictor (P=.184). Model thresholds indicate points of diminishing marginal return from increased training data set sample size measured by the number of sentences, with point estimates ranging from 439 sentences for RoBERTa_large to 527 sentences for GPT-2_large. Likewise, the threshold regression models indicate a diminishing marginal return for EPS with point estimates between 1.36 and 1.38. CONCLUSIONS: Relatively modest sample sizes can be used to fine-tune LLMs for NER tasks applied to biomedical text, and training data entity density should representatively approximate entity density in production data. Training data quality and a model architecture's intended use (text generation vs text processing or classification) may be as, or more, important as training data volume and model parameter size.

19.

Adaptive Randomization Method to Prevent Extreme Instances of Group Size and Covariate Imbalance in Stroke Trials.

Italiano, Dominic; Campbell, Bruce; Hill, Michael D; Johns, Hannah T; Churilov, Leonid.

Stroke ; 55(8): 1962-1972, 2024 Aug.

Article in English | MEDLINE | ID: mdl-38920051

ABSTRACT

BACKGROUND: A recent review of randomization methods used in large multicenter clinical trials within the National Institutes of Health Stroke Trials Network identified preservation of treatment allocation randomness, achievement of the desired group size balance between treatment groups, achievement of baseline covariate balance, and ease of implementation in practice as critical properties required for optimal randomization designs. Common-scale minimal sufficient balance (CS-MSB) adaptive randomization effectively controls for covariate imbalance between treatment groups while preserving allocation randomness but does not balance group sizes. This study extends the CS-MSB adaptive randomization method to achieve both group size and covariate balance while preserving allocation randomness in hyperacute stroke trials. METHODS: A full factorial in silico simulation study evaluated the performance of the proposed new CSSize-MSB adaptive randomization method in achieving group size balance, covariate balance, and allocation randomness compared with the original CS-MSB method. Data from 4 existing hyperacute stroke trials were used to investigate the performance of CSSize-MSB for a range of sample sizes and covariate numbers and types. A discrete-event simulation model created with AnyLogic was used to dynamically visualize the decision logic of the CSSize-MSB randomization process for communication with clinicians. RESULTS: The proposed new CSSize-MSB algorithm uniformly outperformed the CS-MSB algorithm in controlling for group size imbalance while maintaining comparable levels of covariate balance and allocation randomness in hyperacute stroke trials. This improvement was consistent across a distribution of simulated trials with varying levels of imbalance but was increasingly pronounced for trials with extreme cases of imbalance. The results were consistent across a range of trial data sets of different sizes and covariate numbers and types. CONCLUSIONS: The proposed adaptive CSSize-MSB algorithm successfully controls for group size imbalance in hyperacute stroke trials under various settings, and its logic can be readily explained to clinicians using dynamic visualization.

Subject(s)

Stroke , Humans , Sample Size , Randomized Controlled Trials as Topic/methods , Computer Simulation , Random Allocation , Research Design

20.

Statistical Considerations for the Design and Analysis of Pragmatic Trials in Aging Research.

Kammar-García, Ashuin; Fernández-Urrutia, Liliana Aline; Guevara-Díaz, Jorge Alberto; Mancilla-Galindo, Javier.

Geriatrics (Basel) ; 9(3)2024 Jun 04.

Article in English | MEDLINE | ID: mdl-38920431

ABSTRACT

Pragmatic trials aim to assess intervention efficacy in usual patient care settings, contrasting with explanatory trials conducted under controlled conditions. In aging research, pragmatic trials are important designs for obtaining real-world evidence in elderly populations, which are often underrepresented in trials. In this review, we discuss statistical considerations from a frequentist approach for the design and analysis of pragmatic trials. When choosing the dependent variable, it is essential to use an outcome that is highly relevant to usual medical care while also providing sufficient statistical power. Besides traditionally used binary outcomes, ordinal outcomes can provide pragmatic answers with gains in statistical power. Cluster randomization requires careful consideration of sample size calculation and analysis methods, especially regarding missing data and outcome variables. Mixed effects models and generalized estimating equations (GEEs) are recommended for analysis to account for center effects, with tools available for sample size estimation. Multi-arm studies pose challenges in sample size calculation, requiring adjustment for design effects and consideration of multiple comparison correction methods. Secondary analyses are common but require caution due to the risk of reduced statistical power and false-discovery rates. Safety data collection methods should balance pragmatism and data quality. Overall, understanding statistical considerations is crucial for designing rigorous pragmatic trials that evaluate interventions in elderly populations under real-world conditions. In conclusion, this review focuses on various statistical topics of interest to those designing a pragmatic clinical trial, with consideration of aspects of relevance in the aging research field.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL