RESUMO
BACKGROUND: In clinical trials, the determination of an adequate sample size is a challenging task, mainly due to the uncertainty about the value of the effect size and nuisance parameters. One method to deal with this uncertainty is a sample size recalculation. Thereby, an interim analysis is performed based on which the sample size for the remaining trial is adapted. With few exceptions, previous literature has only examined the potential of recalculation in two-stage trials. METHODS: In our research, we address sample size recalculation in three-stage trials, i.e. trials with two pre-planned interim analyses. We show how recalculation rules from two-stage trials can be modified to be applicable to three-stage trials. We also illustrate how a performance measure, recently suggested for two-stage trial recalculation (the conditional performance score) can be applied to evaluate recalculation rules in three-stage trials, and we describe performance evaluation in those trials from the global point of view. To assess the potential of recalculation in three-stage trials, we compare, in a simulation study, two-stage group sequential designs with three-stage group sequential designs as well as multiple three-stage designs with recalculation. RESULTS: While we observe a notable favorable effect in terms of power and expected sample size by using three-stage designs compared to two-stage designs, the benefits of recalculation rules appear less clear and are dependent on the performance measures applied. CONCLUSIONS: Sample size recalculation is also applicable in three-stage designs. However, the extent to which recalculation brings benefits depends on which trial characteristics are most important to the applicants.
Assuntos
Ensaios Clínicos como Assunto , Projetos de Pesquisa , Tamanho da Amostra , Humanos , Ensaios Clínicos como Assunto/métodos , Ensaios Clínicos como Assunto/estatística & dados numéricos , Projetos de Pesquisa/estatística & dados numéricos , Simulação por ComputadorRESUMO
BACKGROUND: Epidemiological and clinical studies often have missing data, frequently analysed using multiple imputation (MI). In general, MI estimates will be biased if data are missing not at random (MNAR). Bias due to data MNAR can be reduced by including other variables ("auxiliary variables") in imputation models, in addition to those required for the substantive analysis. Common advice is to take an inclusive approach to auxiliary variable selection (i.e. include all variables thought to be predictive of missingness and/or the missing values). There are no clear guidelines about the impact of this strategy when data may be MNAR. METHODS: We explore the impact of including an auxiliary variable predictive of missingness but, in truth, unrelated to the partially observed variable, when data are MNAR. We quantify, algebraically and by simulation, the magnitude of the additional bias of the MI estimator for the exposure coefficient (fitting either a linear or logistic regression model), when the (continuous or binary) partially observed variable is either the analysis outcome or the exposure. Here, "additional bias" refers to the difference in magnitude of the MI estimator when the imputation model includes (i) the auxiliary variable and the other analysis model variables; (ii) just the other analysis model variables, noting that both will be biased due to data MNAR. We illustrate the extent of this additional bias by re-analysing data from a birth cohort study. RESULTS: The additional bias can be relatively large when the outcome is partially observed and missingness is caused by the outcome itself, and even larger if missingness is caused by both the outcome and the exposure (when either the outcome or exposure is partially observed). CONCLUSIONS: When using MI, the naïve and commonly used strategy of including all available auxiliary variables should be avoided. We recommend including the variables most predictive of the partially observed variable as auxiliary variables, where these can be identified through consideration of the plausible casual diagrams and missingness mechanisms, as well as data exploration (noting that associations with the partially observed variable in the complete records may be distorted due to selection bias).
Assuntos
Viés , Humanos , Interpretação Estatística de Dados , Modelos Estatísticos , Simulação por Computador , Algoritmos , Modelos Logísticos , Projetos de Pesquisa/estatística & dados numéricosRESUMO
BACKGROUND: Pocock-Simon's minimisation method has been widely used to balance treatment assignments across prognostic factors in randomised controlled trials (RCTs). Previous studies focusing on the survival outcomes have demonstrated that the conservativeness of asymptotic tests without adjusting for stratification factors, as well as the inflated type I error rate of adjusted asymptotic tests conducted in a small sample of patients, can be relaxed using re-randomisation tests. Although several RCTs using minimisation have suggested the presence of non-proportional hazards (non-PH) effects, the application of re-randomisation tests has been limited to the log-rank test and Cox PH models, which may result in diminished statistical power when confronted with non-PH scenarios. To address this issue, we proposed two re-randomisation tests based on a maximum combination of weighted log-rank tests (MaxCombo test) and the difference in restricted mean survival time (dRMST) up to a fixed time point τ , both of which can be extended to adjust for randomisation stratification factors. METHODS: We compared the performance of asymptotic and re-randomisation tests using the MaxCombo test, dRMST, log-rank test, and Cox PH models, assuming various non-PH situations for RCTs using minimisation, with total sample sizes of 50, 100, and 500 at a 1:1 allocation ratio. We mainly considered null, and alternative scenarios featuring delayed, crossing, and diminishing treatment effects. RESULTS: Across all examined null scenarios, re-randomisation tests maintained the type I error rates at the nominal level. Conversely, unadjusted asymptotic tests indicated excessive conservatism, while adjusted asymptotic tests in both the Cox PH models and dRMST indicated inflated type I error rates for total sample sizes of 50. The stratified MaxCombo-based re-randomisation test consistently exhibited robust power across all examined scenarios. CONCLUSIONS: The re-randomisation test is a useful alternative in non-PH situations for RCTs with minimisation using the stratified MaxCombo test, suggesting its robust power in various scenarios.
Assuntos
Modelos de Riscos Proporcionais , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Projetos de Pesquisa/estatística & dados numéricos , Análise de Sobrevida , Modelos Estatísticos , Interpretação Estatística de DadosRESUMO
BACKGROUND: Causal mediation analysis plays a crucial role in examining causal effects and causal mechanisms. Yet, limited work has taken into consideration the use of sampling weights in causal mediation analysis. In this study, we compared different strategies of incorporating sampling weights into causal mediation analysis. METHODS: We conducted a simulation study to assess 4 different sampling weighting strategies-1) not using sampling weights, 2) incorporating sampling weights into mediation "cross-world" weights, 3) using sampling weights when estimating the outcome model, and 4) using sampling weights in both stages. We generated 8 simulated population scenarios comprising an exposure (A), an outcome (Y), a mediator (M), and six covariates (C), all of which were binary. The data were generated so that the true model of A given C and the true model of A given M and C were both logit models. We crossed these 8 population scenarios with 4 different sampling methods to obtain 32 total simulation conditions. For each simulation condition, we assessed the performance of 4 sampling weighting strategies when calculating sample-based estimates of the total, direct, and indirect effects. We also applied the four sampling weighting strategies to a case study using data from the National Survey on Drug Use and Health (NSDUH). RESULTS: Using sampling weights in both stages (mediation weight estimation and outcome models) had the lowest bias under most simulation conditions examined. Using sampling weights in only one stage led to greater bias for multiple simulation conditions. DISCUSSION: Using sampling weights in both stages is an effective approach to reduce bias in causal mediation analyses under a variety of conditions regarding the structure of the population data and sampling methods.
Assuntos
Causalidade , Análise de Mediação , Humanos , Simulação por Computador , Estudos de Amostragem , Modelos Estatísticos , Projetos de Pesquisa/estatística & dados numéricos , Interpretação Estatística de DadosRESUMO
BACKGROUND: An adaptive design allows modifying the design based on accumulated data while maintaining trial validity and integrity. The final sample size may be unknown when designing an adaptive trial. It is therefore important to consider what sample size is used in the planning of the study and how that is communicated to add transparency to the understanding of the trial design and facilitate robust planning. In this paper, we reviewed trial protocols and grant applications on the sample size reporting for randomised adaptive trials. METHOD: We searched protocols of randomised trials with comparative objectives on ClinicalTrials.gov (01/01/2010 to 31/12/2022). Contemporary eligible grant applications accessed from UK publicly funded researchers were also included. Suitable records of adaptive designs were reviewed, and key information was extracted and descriptively analysed. RESULTS: We identified 439 records, and 265 trials were eligible. Of these, 164 (61.9%) and 101 (38.1%) were sponsored by industry and public sectors, respectively, with 169 (63.8%) of all trials using a group sequential design although trial adaptations used were diverse. The maximum and minimum sample sizes were the most reported or directly inferred (n = 199, 75.1%). The sample size assuming no adaptation would be triggered was usually set as the estimated target sample size in the protocol. However, of the 152 completed trials, 15 (9.9%) and 33 (21.7%) had their sample size increased or reduced triggered by trial adaptations, respectively. The sample size calculation process was generally well reported in most cases (n = 216, 81.5%); however, the justification for the sample size calculation parameters was missing in 116 (43.8%) trials. Less than half gave sufficient information on the study design operating characteristics (n = 119, 44.9%). CONCLUSION: Although the reporting of sample sizes varied, the maximum and minimum sample sizes were usually reported. Most of the trials were planned for estimated enrolment assuming no adaptation would be triggered. This is despite the fact a third of reported trials changed their sample size. The sample size calculation was generally well reported, but the justification of sample size calculation parameters and the reporting of the statistical behaviour of the adaptive design could still be improved.
Assuntos
Projetos de Pesquisa , Humanos , Ensaios Clínicos Adaptados como Assunto/estatística & dados numéricos , Ensaios Clínicos Adaptados como Assunto/métodos , Comunicação , Projetos de Pesquisa/estatística & dados numéricos , Tamanho da AmostraRESUMO
BACKGROUND: Simulation is an important tool for assessing the performance of statistical methods for the analysis of data and for the planning of studies. While methods are available for the simulation of correlated binary random variables, all have significant practical limitations for simulating outcomes from longitudinal cluster randomised trial designs, such as the cluster randomised crossover and the stepped wedge trial designs. For these trial designs as the number of observations in each cluster increases these methods either become computationally infeasible or their range of allowable correlations rapidly shrinks to zero. METHODS: In this paper we present a simple method for simulating binary random variables with a specified vector of prevalences and correlation matrix. This method allows for the outcome prevalence to change due to treatment or over time, and for a 'nested exchangeable' correlation structure, in which observations in the same cluster are more highly correlated if they are measured in the same time period than in different time periods, and where different individuals are measured in each time period. This means that our method is also applicable to more general hierarchical clustered data contexts, such as students within classrooms within schools. The method is demonstrated by simulating 1000 datasets with parameters matching those derived from data from a cluster randomised crossover trial assessing two variants of stress ulcer prophylaxis. RESULTS: Our method is orders of magnitude faster than the most well known general simulation method while also allowing a much wider range of correlations than alternative methods. An implementation of our method is available in an R package NestBin. CONCLUSIONS: This simulation method is the first to allow for practical and efficient simulation of large datasets of binary outcomes with the commonly used nested exchangeable correlation structure. This will allow for much more effective testing of designs and inference methods for longitudinal cluster randomised trials with binary outcomes.
Assuntos
Simulação por Computador , Estudos Cross-Over , Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Estudos Longitudinais , Análise por Conglomerados , Projetos de Pesquisa/estatística & dados numéricos , Modelos Estatísticos , Interpretação Estatística de Dados , AlgoritmosRESUMO
BACKGROUND: A platform trial approach allows adding arms to on-going trials to speed up intervention discovery programs. A control arm remains open for recruitment in a platform trial while intervention arms may be added after the onset of the study and could be terminated early for efficacy and/or futility when early stopping is allowed. The topic of utilising non-concurrent control data in the analysis of platform trials has been explored and discussed extensively. A less familiar issue is the presence of heterogeneity, which may exist for example due to modification of enrolment criteria and recruitment strategy. METHOD: We conduct a simulation study to explore the impact of heterogeneity on the analysis of a two-stage platform trial design. We consider heterogeneity in treatment effects and heteroscedasticity in outcome data across stages for a normally distributed endpoint. We examine the performance of some hypothesis testing procedures and modelling strategies. The use of non-concurrent control data is also considered accordingly. Alongside standard regression analysis, we examine the performance of a novel method that was known as the pairwise trials analysis. It is similar to a network meta-analysis approach but adjusts for treatment comparisons instead of individual studies using fixed effects. RESULTS: Several testing strategies with concurrent control data seem to control the type I error rate at the required level when there is heteroscedasticity in outcome data across stages and/or a random cohort effect. The main parameter of treatment effects in some analysis models correspond to overall treatment effects weighted by stage wise sample sizes; while others correspond to the effect observed within a single stage. The characteristics of the estimates are not affected significantly by the presence of a random cohort effect and/ or heteroscedasticity. CONCLUSION: In view of heterogeneity in treatment effect across stages, the specification of null hypotheses in platform trials may need to be more subtle. We suggest employing testing procedure of adaptive design as opposed to testing the statistics from regression models; comparing the estimates from the pairwise trials analysis method and the regression model with interaction terms may indicate if heterogeneity is negligible.
Assuntos
Projetos de Pesquisa , Humanos , Projetos de Pesquisa/estatística & dados numéricos , Ensaios Clínicos como Assunto/métodos , Ensaios Clínicos como Assunto/estatística & dados numéricos , Simulação por Computador , Modelos Estatísticos , Interpretação Estatística de Dados , Análise de Regressão , Resultado do TratamentoRESUMO
BACKGROUND: The procedures used to assess the methodological quality and risk of bias (RoB) of systematic reviews of observational dental studies have not been investigated. The purpose of this research was to examine the way that authors of systematic reviews of epidemiological observational studies published in dentistry conducted the methodological assessment of those primary studies. In the present article, we aimed to assess the characteristics and the level of reporting of tools used to assess the methodologies of these reviews. METHODS: We searched Scopus and the Web of Science from their inceptions to June 2023 for systematic reviews with meta-analyses of observational studies published in dentistry. Document selection and data extraction were performed in duplicate and independently by two authors. In a random sample of 10% of the systematic reviews, there was an agreement of more than 80% between the reviewers; data selection and extraction were conducted in the remaining 90% of the sample by one author. Data on the article and systematic review characteristics were extracted and recorded for descriptive reporting. RESULTS: The search in the two databases resulted in the inclusion of 3,214 potential documents. After the elimination of duplicates and the application of the eligibility criteria, a total of 399 systematic reviews were identified and included. A total of 368 systematic reviews reported a methodological tool, of which 102 used the Newcastle-Ottawa scale. Additionally, 76 systematic reviews stated the use of a modified methodological tool. Information about the approach of assessing the methodological quality or RoB of primary studies but reporting no tool or tool name occurred in 25 reviews. CONCLUSIONS: The majority of authors of systematic reviews of epidemiological observational studies published in dentistry reported the tools used to assess the methodological quality or RoB of the included primary studies. Modifying existing tools to meet the individual characteristics of various studies should be considered.
Assuntos
Estudos Observacionais como Assunto , Revisões Sistemáticas como Assunto , Humanos , Estudos Observacionais como Assunto/métodos , Estudos Observacionais como Assunto/estatística & dados numéricos , Revisões Sistemáticas como Assunto/métodos , Projetos de Pesquisa/estatística & dados numéricos , Estudos Epidemiológicos , Viés , Metanálise como Assunto , Autoria , Odontologia/métodos , Odontologia/estatística & dados numéricosRESUMO
Practical limitations of quality and quantity of data can limit the precision of parameter identification in mathematical models. Model-based experimental design approaches have been developed to minimise parameter uncertainty, but the majority of these approaches have relied on first-order approximations of model sensitivity at a local point in parameter space. Practical identifiability approaches such as profile-likelihood have shown potential for quantifying parameter uncertainty beyond linear approximations. This research presents a genetic algorithm approach to optimise sample timing across various parameterisations of a demonstrative PK-PD model with the goal of aiding experimental design. The optimisation relies on a chosen metric of parameter uncertainty that is based on the profile-likelihood method. Additionally, the approach considers cases where multiple parameter scenarios may require simultaneous optimisation. The genetic algorithm approach was able to locate near-optimal sampling protocols for a wide range of sample number (n = 3-20), and it reduced the parameter variance metric by 33-37% on average. The profile-likelihood metric also correlated well with an existing Monte Carlo-based metric (with a worst-case r > 0.89), while reducing computational cost by an order of magnitude. The combination of the new profile-likelihood metric and the genetic algorithm demonstrate the feasibility of considering the nonlinear nature of models in optimal experimental design at a reasonable computational cost. The outputs of such a process could allow for experimenters to either improve parameter certainty given a fixed number of samples, or reduce sample quantity while retaining the same level of parameter certainty.
Assuntos
Algoritmos , Simulação por Computador , Conceitos Matemáticos , Modelos Biológicos , Método de Monte Carlo , Funções Verossimilhança , Humanos , Relação Dose-Resposta a Droga , Projetos de Pesquisa/estatística & dados numéricos , Modelos Genéticos , IncertezaRESUMO
The failure rates of phase 3 trials are high. Incorrect sample size due to uncertainty of effect size could be a critical contributing factor. Adaptive sequential design (ASD), which may include one or more sample size re-estimations (SSR), has been a popular approach for dealing with such uncertainties. The operating characteristics (OCs) of ASD, including the unconditional power and mean sample size, can be substantially affected by many factors, including the planned sample size, the interim analysis schedule and choice of critical boundaries and rules for interim analysis. We propose a systematic, comprehensive strategy which uses iterative simulations to investigate the operating characteristics of adaptive designs and help achieve adequate unconditional power and cost-effective mean sample size if the effect size is in a pre-identified range.
Assuntos
Simulação por Computador , Projetos de Pesquisa , Humanos , Tamanho da Amostra , Projetos de Pesquisa/estatística & dados numéricos , Modelos Estatísticos , Ensaios Clínicos Fase III como Assunto/estatística & dados numéricos , Ensaios Clínicos Fase III como Assunto/métodos , Interpretação Estatística de DadosRESUMO
Latent repeated measures ANOVA (L-RM-ANOVA) has recently been proposed as an alternative to traditional repeated measures ANOVA. L-RM-ANOVA builds upon structural equation modeling and enables researchers to investigate interindividual differences in main/interaction effects, examine custom contrasts, incorporate a measurement model, and account for missing data. However, L-RM-ANOVA uses maximum likelihood and thus cannot incorporate prior information and can have poor statistical properties in small samples. We show how L-RM-ANOVA can be used with Bayesian estimation to resolve the aforementioned issues. We demonstrate how to place informative priors on model parameters that constitute main and interaction effects. We further show how to place weakly informative priors on standardized parameters which can be used when no prior information is available. We conclude that Bayesian estimation can lower Type 1 error and bias, and increase power and efficiency when priors are chosen adequately. We demonstrate the approach using a real empirical example and guide the readers through specification of the model. We argue that ANOVA tables and incomplete descriptive statistics are not sufficient information to specify informative priors, and we identify which parameter estimates should be reported in future research; thereby promoting cumulative research.
Assuntos
Teorema de Bayes , Humanos , Análise de Variância , Projetos de Pesquisa/estatística & dados numéricos , Modelos Estatísticos , Interpretação Estatística de Dados , Análise de Classes Latentes , Funções VerossimilhançaRESUMO
With the advent of cancer immunotherapy, some special features including delayed treatment effect, cure rate, diminishing treatment effect and crossing survival are often observed in survival analysis. They violate the proportional hazard model assumption and pose a unique challenge for the conventional trial design and analysis strategies. Many methods like cure rate model have been developed based on mixture model to incorporate some of these features. In this work, we extend the mixture model to deal with multiple non-proportional patterns and develop its geometric average hazard ratio (gAHR) to quantify the treatment effect. We further derive a sample size and power formula based on the non-centrality parameter of the log-rank test and conduct a thorough analysis of the impact of each parameter on performance. Simulation studies showed a clear advantage of our new method over the proportional hazard based calculation across different non-proportional hazard scenarios. Moreover, the mixture modeling of two real trials demonstrates how to use the prior information on the survival distribution among patients with different biomarker and early efficacy results in practice. By comparison with a simulation-based design, the new method provided a more efficient way to compute the power and sample size with high accuracy of estimation. Overall, both theoretical derivation and empirical studies demonstrate the promise of the proposed method in powering future innovative trial designs.
Assuntos
Simulação por Computador , Modelos de Riscos Proporcionais , Projetos de Pesquisa , Humanos , Tamanho da Amostra , Projetos de Pesquisa/estatística & dados numéricos , Análise de Sobrevida , Neoplasias/terapia , Neoplasias/tratamento farmacológico , Neoplasias/mortalidade , Modelos Estatísticos , Imunoterapia/métodosRESUMO
Biomarkers are key components of personalized medicine. In this paper, we consider biomarkers taking continuous values that are associated with disease status, called case and control. The performance of such a biomarker is evaluated by the area under the curve (AUC) of its receiver operating characteristic curve. Oftentimes, two biomarkers are collected from each subject to test if one has a larger AUC than the other. We propose a simple non-parametric statistical test for comparing the performance of two biomarkers. We also present a simple sample size calculation method for this test statistic. Our sample size formula requires specification of AUC values (or the standardized effect size of each biomarker between cases and controls together with the correlation coefficient between two biomarkers), prevalence of cases in the study population, type I error rate, and power. Through simulations, we show that the testing on two biomarkers controls type I error rate accurately and the proposed sample size closely maintains specified statistical power.
Assuntos
Área Sob a Curva , Biomarcadores , Simulação por Computador , Curva ROC , Humanos , Tamanho da Amostra , Biomarcadores/análise , Estudos de Casos e Controles , Medicina de Precisão/métodos , Medicina de Precisão/estatística & dados numéricos , Modelos Estatísticos , Projetos de Pesquisa/estatística & dados numéricos , Interpretação Estatística de DadosRESUMO
Multiregional clinical trials (MRCTs) have become increasingly common during the development of new drugs to obtain simultaneous drug approvals worldwide. When planning MRCTs, a major statistical challenge is determination of the regional sample size. In general, the regional sample size must be determined as the sample size such that the regional consistency probability, defined as the probability of meeting the regional consistency criterion, is greater than a prespecified value. The Japanese Ministry of Health, Labour and Welfare proposed two criteria for regional consistency. Moreover, many researchers have proposed corresponding closed-form formulas for calculating regional consistency probabilities when the primary outcome is continuous. Although some researchers have argued that those formulas are also applicable to cases with binary outcomes, it remains questionable whether such an argument can be true. Based on simulation results, we demonstrate that the existing formulas are inappropriate for binary cases, even when the regional sample size is sufficiently large. To address this issue, we develop alternative formulas and use simulation to show that they provide accurate regional consistency probabilities. Furthermore, we present an application of our proposed formulas for an MRCT of advanced or metastatic clear-cell renal cell carcinoma.
Assuntos
Simulação por Computador , Humanos , Tamanho da Amostra , Estudos Multicêntricos como Assunto/métodos , Probabilidade , Modelos Estatísticos , Projetos de Pesquisa/estatística & dados numéricos , Ensaios Clínicos como Assunto/métodos , Ensaios Clínicos como Assunto/estatística & dados numéricos , Neoplasias Renais/tratamento farmacológico , Carcinoma de Células Renais/tratamento farmacológico , Aprovação de Drogas/métodos , Interpretação Estatística de Dados , JapãoRESUMO
Credibility of scientific claims is established with evidence for their replicability using new data. According to common understanding, replication is repeating a study's procedure and observing whether the prior finding recurs. This definition is intuitive, easy to apply, and incorrect. We propose that replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research. This definition reduces emphasis on operational characteristics of the study and increases emphasis on the interpretation of possible outcomes. The purpose of replication is to advance theory by confronting existing understanding with new evidence. Ironically, the value of replication may be strongest when existing understanding is weakest. Successful replication provides evidence of generalizability across the conditions that inevitably differ from the original study; Unsuccessful replication indicates that the reliability of the finding may be more constrained than recognized previously. Defining replication as a confrontation of current theoretical expectations clarifies its important, exciting, and generative role in scientific progress.
Assuntos
Projetos de Pesquisa/estatística & dados numéricos , Projetos de Pesquisa/normas , Interpretação Estatística de Dados , Humanos , Reprodutibilidade dos Testes , Estatística como AssuntoRESUMO
Researchers face many, often seemingly arbitrary, choices in formulating hypotheses, designing protocols, collecting data, analyzing data, and reporting results. Opportunistic use of "researcher degrees of freedom" aimed at obtaining statistical significance increases the likelihood of obtaining and publishing false-positive results and overestimated effect sizes. Preregistration is a mechanism for reducing such degrees of freedom by specifying designs and analysis plans before observing the research outcomes. The effectiveness of preregistration may depend, in part, on whether the process facilitates sufficiently specific articulation of such plans. In this preregistered study, we compared 2 formats of preregistration available on the OSF: Standard Pre-Data Collection Registration and Prereg Challenge Registration (now called "OSF Preregistration," http://osf.io/prereg/). The Prereg Challenge format was a "structured" workflow with detailed instructions and an independent review to confirm completeness; the "Standard" format was "unstructured" with minimal direct guidance to give researchers flexibility for what to prespecify. Results of comparing random samples of 53 preregistrations from each format indicate that the "structured" format restricted the opportunistic use of researcher degrees of freedom better (Cliff's Delta = 0.49) than the "unstructured" format, but neither eliminated all researcher degrees of freedom. We also observed very low concordance among coders about the number of hypotheses (14%), indicating that they are often not clearly stated. We conclude that effective preregistration is challenging, and registration formats that provide effective guidance may improve the quality of research.
Assuntos
Coleta de Dados/métodos , Projetos de Pesquisa/estatística & dados numéricos , Coleta de Dados/normas , Coleta de Dados/tendências , Humanos , Controle de Qualidade , Sistema de Registros/estatística & dados numéricos , Projetos de Pesquisa/tendênciasRESUMO
Mayookha Mitra-Majumdar and Aaron Kesselheim reflect on steps taken to combat reporting bias in clinical trials over the last two decades.
Assuntos
Viés , Ensaios Clínicos como Assunto/estatística & dados numéricos , Projetos de Pesquisa/estatística & dados numéricos , HumanosRESUMO
BACKGROUND & AIMS: Ferroportin disease is a rare genetic iron overload disorder which may be underdiagnosed, with recent data suggesting it occurs at a higher prevalence than suspected. Costs and the lack of defined criteria to prompt genetic testing preclude large-scale molecular screening. Hence, we aimed to develop a readily available scoring system to promote and enhance ferroportin disease screening. METHODS: Our derivation cohort included probands tested for ferroportin disease from 2008 to 2016 in our rare disease network. Data were prospectively recorded. Univariate and multivariate logistic regression were used to determine significant criteria, and odds ratios were used to build a weighted score. A cut-off value was defined using a ROC curve with a predefined aim of 90% sensitivity. An independent cohort was used for cross validation. RESULTS: Our derivation cohort included 1,306 patients. Mean age was 55±14 years, ferritin 1,351±1,357 µg/L, and liver iron concentration (LIC) 166±77 µmol/g. Pathogenic variants (n = 32) were identified in 71 patients. In multivariate analysis: female sex, younger age, higher ferritin, higher LIC and the absence of hypertension or diabetes were significantly associated with the diagnosis of ferroportin disease (AUROC in whole derivation cohort 0.83 [0.78-0.88]). The weighted score was based on sex, age, the presence of hypertension or diabetes, ferritin level and LIC. An AUROC of 0.83 (0.77-0.88) was obtained in the derivation cohort without missing values. Using 9.5 as a cut-off, sensitivity was 93.6 (91.7-98.3) %, specificity 49.5 (45.5-53.6) %, positive likelihood ratio 1.8 (1.6-2.0) and negative likelihood ratio 0.17 (0.04-0.37). CONCLUSION: We describe a readily available score with simple criteria and good diagnostic performance that could be used to screen patients for ferroportin disease in routine clinical practice. LAY SUMMARY: Increased iron burden associated with metabolic syndrome is a very common condition. Ferroportin disease is a dominant genetic iron overload disorder whose prevalence is higher than initially thought. They can be difficult to distinguish from each other, but the limited availability of genetic testing and the lack of definitive guidelines prevent adequate screening. We herein describe a simple and definitive clinical score to help clinicians decide whether to perform genetic testing.
Assuntos
Proteínas de Transporte de Cátions/análise , Hemocromatose/diagnóstico , Projetos de Pesquisa/normas , Idoso , Proteínas de Transporte de Cátions/sangue , Estudos de Coortes , Feminino , Hemocromatose/sangue , Humanos , Ferro/metabolismo , Sobrecarga de Ferro/sangue , Sobrecarga de Ferro/complicações , Modelos Logísticos , Masculino , Programas de Rastreamento/métodos , Programas de Rastreamento/estatística & dados numéricos , Pessoa de Meia-Idade , Curva ROC , Projetos de Pesquisa/estatística & dados numéricosRESUMO
OBJECTIVE: Spin is the manipulation of language that distorts the interpretation of objective findings. The purpose of this study is to describe the characteristics of spin found in statistically nonsignificant randomized controlled trials (RCT) comparing carotid endarterectomy with carotid artery stenting for carotid artery stenosis (CS), and endovascular repair with open repair (OR) for abdominal aortic aneurysms (AAA). METHODS: A search of MEDLINE, EMBASE, and the Cochrane Controlled Register of Trials was performed in June 2020 for studies published describing AAA or CS. All phase III RCTs with nonsignificant primary outcomes comparing open repair with endovascular repair or carotid endarterectomy to carotid artery stenting were included. Studies were appraised for the characteristics and severity of spin using a validated tool. Binary logistic regression was performed to assess the association of spin grade to (1) funding source (commercial vs noncommercial) and (2) the publishing journal's impact factor. RESULTS: Thirty-one of 355 articles captured were included for analysis. Spin was identified in 9 abstracts (9/18) and 13 main texts (13/18) of AAA articles and 7 abstracts (7/13) and 10 main texts (10/13) of CS articles. For both AAA and CS articles, spin was most commonly found in the discussion section, with the most commonly used strategy being the interpretation of statistically nonsignificant primary results to show treatment equivalence or rule out adverse treatment effects. Increasing journal impact factor was associated with a statistically significant lower likelihood of spin in the study title or abstract conclusion (ß odds ratio, 0.96; 95% confidence interval, 0.94-0.98; P < .01); no significant association could be found with funding source (ß odds ratio, 1.33; 95% confidence interval, 0.30-5.92; P = .71). CONCLUSIONS: A large proportion of statistically nonsignificant RCTs contain interpretations that are inconsistent with their results. These findings should prompt authors and readers to appraise study findings independently and to limit the use of spin in study interpretations.
Assuntos
Aneurisma da Aorta Abdominal/cirurgia , Estenose das Carótidas/cirurgia , Publicações Periódicas como Assunto , Projetos de Pesquisa , Procedimentos Cirúrgicos Vasculares , Redação , Implante de Prótese Vascular , Interpretação Estatística de Dados , Endarterectomia das Carótidas , Procedimentos Endovasculares , Humanos , Fator de Impacto de Revistas , Ensaios Clínicos Controlados Aleatórios como Assunto , Projetos de Pesquisa/estatística & dados numéricos , Stents , Resultado do Tratamento , Procedimentos Cirúrgicos Vasculares/instrumentação , Procedimentos Cirúrgicos Vasculares/estatística & dados numéricosRESUMO
BACKGROUND: Pilot studies test the feasibility of methods and procedures to be used in larger-scale studies. Although numerous articles describe guidelines for the conduct of pilot studies, few have included specific feasibility indicators or strategies for evaluating multiple aspects of feasibility. In addition, using pilot studies to estimate effect sizes to plan sample sizes for subsequent randomized controlled trials has been challenged; however, there has been little consensus on alternative strategies. METHODS: In Section 1, specific indicators (recruitment, retention, intervention fidelity, acceptability, adherence, and engagement) are presented for feasibility assessment of data collection methods and intervention implementation. Section 1 also highlights the importance of examining feasibility when adapting an intervention tested in mainstream populations to a new more diverse group. In Section 2, statistical and design issues are presented, including sample sizes for pilot studies, estimates of minimally important differences, design effects, confidence intervals (CI) and nonparametric statistics. An in-depth treatment of the limits of effect size estimation as well as process variables is presented. Tables showing CI around parameters are provided. With small samples, effect size, completion and adherence rate estimates will have large CI. CONCLUSION: This commentary offers examples of indicators for evaluating feasibility, and of the limits of effect size estimation in pilot studies. As demonstrated, most pilot studies should not be used to estimate effect sizes, provide power calculations for statistical tests or perform exploratory analyses of efficacy. It is hoped that these guidelines will be useful to those planning pilot/feasibility studies before a larger-scale study.