Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 347
Filtrar
1.
Front Neurosci ; 18: 1381722, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39156630

RESUMO

Introduction: Functional magnetic resonance imaging (fMRI) has become a fundamental tool for studying brain function. However, the presence of serial correlations in fMRI data complicates data analysis, violates the statistical assumptions of analyses methods, and can lead to incorrect conclusions in fMRI studies. Methods: In this paper, we show that conventional whitening procedures designed for data with longer repetition times (TRs) (>2 s) are inadequate for the increasing use of short-TR fMRI data. Furthermore, we comprehensively investigate the shortcomings of existing whitening methods and introduce an iterative whitening approach named "IDAR" (Iterative Data-adaptive Autoregressive model) to address these shortcomings. IDAR employs high-order autoregressive (AR) models with flexible and data-driven orders, offering the capability to model complex serial correlation structures in both short-TR and long-TR fMRI datasets. Results: Conventional whitening methods, such as AR(1), ARMA(1,1), and higher-order AR, were effective in reducing serial correlation in long-TR data but were largely ineffective in even reducing serial correlation in short-TR data. In contrast, IDAR significantly outperformed conventional methods in addressing serial correlation, power, and Type-I error for both long-TR and especially short-TR data. However, IDAR could not simultaneously address residual correlations and inflated Type-I error effectively. Discussion: This study highlights the urgent need to address the problem of serial correlation in short-TR (< 1 s) fMRI data, which are increasingly used in the field. Although IDAR can address this issue for a wide range of applications and datasets, the complexity of short-TR data necessitates continued exploration and innovative approaches. These efforts are essential to simultaneously reduce serial correlations and control Type-I error rates without compromising analytical power.

2.
Stat Med ; 2024 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-39193779

RESUMO

BACKGROUND: Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol-related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero-inflated, particularly compared with recently developed marginalized count regression approaches for such data. METHODS: The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three prevailing count distribution-based models (ie, Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero-inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of the statistical models and approaches across data conditions that varied in sample size ( N = 100 $$ N=100 $$ to 500), zero rate (0.2 to 0.8), and intervention effect sizes. RESULTS: Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of the linear model with a log-transformed outcome variable was unsatisfactory. CONCLUSIONS: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.

3.
Behav Res Methods ; 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38886305

RESUMO

Recently, Asparouhov and Muthén Structural Equation Modeling: A Multidisciplinary Journal, 28, 1-14, (2021a, 2021b) proposed a variant of the Wald test that uses Markov chain Monte Carlo machinery to generate a chi-square test statistic for frequentist inference. Because the test's composition does not rely on analytic expressions for sampling variation and covariation, it potentially provides a way to get honest significance tests in cases where the likelihood-based test statistic's assumptions break down (e.g., in small samples). The goal of this study is to use simulation to compare the new MCM Wald test to its maximum likelihood counterparts, with respect to both their type I error rate and power. Our simulation examined the test statistics across different levels of sample size, effect size, and degrees of freedom (test complexity). An additional goal was to assess the robustness of the MCMC Wald test with nonnormal data. The simulation results uniformly demonstrated that the MCMC Wald test was superior to the maximum likelihood test statistic, especially with small samples (e.g., sample sizes less than 150) and complex models (e.g., models with five or more predictors). This conclusion held for nonnormal data as well. Lastly, we provide a brief application to a real data example.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38923520

RESUMO

The analysis of multiple bivariate correlations is often carried out by conducting simple tests to check whether each of them is significantly different from zero. In addition, pairwise differences are often judged by eye or by comparing the p-values of the individual tests of significance despite the existence of statistical tests for differences between correlations. This paper uses simulation methods to assess the accuracy (empirical Type I error rate), power, and robustness of 10 tests designed to check the significance of the difference between two dependent correlations with overlapping variables (i.e., the correlation between X1 and Y and the correlation between X2 and Y). Five of the tests turned out to be inadvisable because their empirical Type I error rates under normality differ greatly from the nominal alpha level of .05 either across the board or within certain sub-ranges of the parameter space. The remaining five tests were acceptable and their merits were similar in terms of all comparison criteria, although none of them was robust across all forms of non-normality explored in the study. Practical recommendations are given for the choice of a statistical test to compare dependent correlations with overlapping variables.

5.
BMC Med Res Methodol ; 24(1): 124, 2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38831421

RESUMO

BACKGROUND: Multi-arm multi-stage (MAMS) randomised trial designs have been proposed to evaluate multiple research questions in the confirmatory setting. In designs with several interventions, such as the 8-arm 3-stage ROSSINI-2 trial for preventing surgical wound infection, there are likely to be strict limits on the number of individuals that can be recruited or the funds available to support the protocol. These limitations may mean that not all research treatments can continue to accrue the required sample size for the definitive analysis of the primary outcome measure at the final stage. In these cases, an additional treatment selection rule can be applied at the early stages of the trial to restrict the maximum number of research arms that can progress to the subsequent stage(s). This article provides guidelines on how to implement treatment selection within the MAMS framework. It explores the impact of treatment selection rules, interim lack-of-benefit stopping boundaries and the timing of treatment selection on the operating characteristics of the MAMS selection design. METHODS: We outline the steps to design a MAMS selection trial. Extensive simulation studies are used to explore the maximum/expected sample sizes, familywise type I error rate (FWER), and overall power of the design under both binding and non-binding interim stopping boundaries for lack-of-benefit. RESULTS: Pre-specification of a treatment selection rule reduces the maximum sample size by approximately 25% in our simulations. The familywise type I error rate of a MAMS selection design is smaller than that of the standard MAMS design with similar design specifications without the additional treatment selection rule. In designs with strict selection rules - for example, when only one research arm is selected from 7 arms - the final stage significance levels can be relaxed for the primary analyses to ensure that the overall type I error for the trial is not underspent. When conducting treatment selection from several treatment arms, it is important to select a large enough subset of research arms (that is, more than one research arm) at early stages to maintain the overall power at the pre-specified level. CONCLUSIONS: Multi-arm multi-stage selection designs gain efficiency over the standard MAMS design by reducing the overall sample size. Diligent pre-specification of the treatment selection rule, final stage significance level and interim stopping boundaries for lack-of-benefit are key to controlling the operating characteristics of a MAMS selection design. We provide guidance on these design features to ensure control of the operating characteristics.


Assuntos
Ensaios Clínicos Controlados Aleatórios como Assunto , Projetos de Pesquisa , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Tamanho da Amostra , Seleção de Pacientes
6.
Indian J Psychiatry ; 66(5): 472-476, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38919569

RESUMO

In research, outcomes are often categorized as primary and secondary. The primary outcome is the most important one; it determines whether the study is considered 'successful' or not. Secondary outcomes are chosen because they provide supporting evidence for the results of the primary outcome or additional information about the subject being studied. For reasons that are explained in this paper, secondary outcomes should be cautiously interpreted. There are varying practices regarding publishing secondary outcomes. Some authors publish these separately, while others include them in the main publication. In some contexts, the former can lead to concerns about the quality and relevance of the data being published. In this article, we discuss primary and secondary outcomes, the importance and interpretation of secondary outcomes, and considerations for publishing multiple outcomes in separate papers. We also discuss the special case of secondary analyses and post hoc analyses and provide guidance on good publishing practices. Throughout the article, we use relevant examples to make these concepts easier to understand. While the article is primarily aimed at early career researchers, it offers insights that may be helpful to researchers, reviewers, and editors across all levels of expertise.

7.
Oecologia ; 205(2): 257-269, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38806949

RESUMO

Community weighted means (CWMs) are widely used to study the relationship between community-level functional traits and environment. For certain null hypotheses, CWM-environment relationships assessed by linear regression or ANOVA and tested by standard parametric tests are prone to inflated Type I error rates. Previous research has found that this problem can be solved by permutation tests (i.e., the max test). A recent extension of the CWM approach allows the inclusion of intraspecific trait variation (ITV) by the separate calculation of fixed, site-specific, and intraspecific CWMs. The question is whether the same Type I error rate inflation exists for the relationship between environment and site-specific or intraspecific CWM. Using simulated and real-world community datasets, we show that site-specific CWM-environment relationships have also inflated Type I error rate, and this rate is negatively related to the relative ITV magnitude. In contrast, for intraspecific CWM-environment relationships, standard parametric tests have the correct Type I error rate, although somewhat reduced statistical power. We introduce an ITV-extended version of the max test, which can solve the inflation problem for site-specific CWM-environment relationships and, without considering ITV, becomes equivalent to the "original" max test used for the CWM approach. We show that this new ITV-extended max test works well across the full possible magnitude of ITV on both simulated and real-world data. Most real datasets probably do not have intraspecific trait variation large enough to alleviate the problem of inflated Type I error rate, and published studies possibly report overly optimistic significance results.


Assuntos
Ecossistema
8.
Stat Med ; 2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38573319

RESUMO

The two-trials rule for drug approval requires "at least two adequate and well-controlled studies, each convincing on its own, to establish effectiveness." This is usually implemented by requiring two significant pivotal trials and is the standard regulatory requirement to provide evidence for a new drug's efficacy. However, there is need to develop suitable alternatives to this rule for a number of reasons, including the possible availability of data from more than two trials. I consider the case of up to three studies and stress the importance to control the partial Type-I error rate, where only some studies have a true null effect, while maintaining the overall Type-I error rate of the two-trials rule, where all studies have a null effect. Some less-known P $$ P $$ -value combination methods are useful to achieve this: Pearson's method, Edgington's method and the recently proposed harmonic mean χ 2 $$ {\chi}^2 $$ -test. I study their properties and discuss how they can be extended to a sequential assessment of success while still ensuring overall Type-I error control. I compare the different methods in terms of partial Type-I error rate, project power and the expected number of studies required. Edgington's method is eventually recommended as it is easy to implement and communicate, has only moderate partial Type-I error rate inflation but substantially increased project power.

9.
Artigo em Inglês | MEDLINE | ID: mdl-38623032

RESUMO

Inter-rater reliability (IRR) is one of the commonly used tools for assessing the quality of ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome; the applicant is either selected or not. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. We outline the connection between the ratings' measurement model (used for IRR) and a binary classification framework. We develop a simple way of approximating the probability of correctly selecting the best applicants which allows us to compute error probabilities of the selection procedure (i.e., false positive and false negative rate) or their lower bounds. We draw connections between the IRR and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures. We also discuss other possible uses of the explored connections in other contexts, such as educational testing, psychological assessment, and health-related measurement, and implement the computations in the R package IRR2FPR.

10.
J Am Stat Assoc ; 119(545): 332-342, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38660582

RESUMO

Classical tests for a difference in means control the type I error rate when the groups are defined a priori. However, when the groups are instead defined via clustering, then applying a classical test yields an extremely inflated type I error rate. Notably, this problem persists even if two separate and independent data sets are used to define the groups and to test for a difference in their means. To address this problem, in this paper, we propose a selective inference approach to test for a difference in means between two clusters. Our procedure controls the selective type I error rate by accounting for the fact that the choice of null hypothesis was made based on the data. We describe how to efficiently compute exact p-values for clusters obtained using agglomerative hierarchical clustering with many commonly-used linkages. We apply our method to simulated data and to single-cell RNA-sequencing data.

11.
Cogn Neurosci ; 15(2): 79-82, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38647209

RESUMO

Steinkrauss and Slotnick (2024) reviewed neuroimaging studies linking the hippocampus with implicit memory. They conclude that there is no convincing evidence that the hippocampus is associated with implicit memory because prior studies are confounded by explicit memory (among other factors). Here, we ask a different yet equally important question: do reports of unconscious hippocampal activity reflect a Type-I error (i.e. a false positive)? We find that 39% of studies linking the hippocampus with implicit memory (7 of 18) do not report correcting for multiple comparisons. These results indicate that many unconscious hippocampal effects may reflect a Type-I error.


Assuntos
Hipocampo , Hipocampo/fisiologia , Hipocampo/diagnóstico por imagem , Humanos , Memória/fisiologia , Inconsciente Psicológico
12.
J Biopharm Stat ; : 1-14, 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38515269

RESUMO

In recent years, clinical trials utilizing a two-stage seamless adaptive trial design have become very popular in drug development. A typical example is a phase 2/3 adaptive trial design, which consists of two stages. As an example, stage 1 is for a phase 2 dose-finding study and stage 2 is for a phase 3 efficacy confirmation study. Depending upon whether or not the target patient population, study objectives, and study endpoints are the same at different stages, Chow (2020) classified two-stage seamless adaptive design into eight categories. In practice, standard statistical methods for group sequential design with one planned interim analysis are often wrongly directly applied for data analysis. In this article, following similar ideas proposed by Chow and Lin (2015) and Chow (2020), a statistical method for the analysis of a two-stage seamless adaptive trial design with different study endpoints and shifted target patient population is discussed under the fundamental assumption that study endpoints have a known relationship. The proposed analysis method should be useful in both clinical trials with protocol amendments and clinical trials with the existence of disease progression utilizing a two-stage seamless adaptive trial design.

13.
BMC Public Health ; 24(1): 901, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38539086

RESUMO

BACKGROUND: Count time series (e.g., daily deaths) are a very common type of data in environmental health research. The series is generally autocorrelated, while the widely used generalized linear model is based on the assumption of independent outcomes. None of the existing methods for modelling parameter-driven count time series can obtain consistent and reliable standard error of parameter estimates, causing potential inflation of type I error rate. METHODS: We proposed a new maximum significant ρ correction (MSRC) method that utilizes information of significant autocorrelation coefficient ρ estimate within 5 orders by moment estimation. A Monte Carlo simulation was conducted to evaluate and compare the finite sample performance of the MSRC and classical unbiased correction (UB-corrected) method. We demonstrated a real-data analysis for assessing the effect of drunk driving regulations on the incidence of road traffic injuries (RTIs) using MSRC in Shenzhen, China. Moreover, there is no previous paper assessing the time-varying intervention effect and considering autocorrelation based on daily data of RTIs. RESULTS: Both methods had a small bias in the regression coefficients. The autocorrelation coefficient estimated by UB-corrected is slightly underestimated at high autocorrelation (≥ 0.6), leading to the inflation of the type I error rate. The new method well controlled the type I error rate when the sample size reached 340. Moreover, the power of MSRC increased with increasing sample size and effect size and decreasing nuisance parameters, and it approached UB-corrected when ρ was small (≤ 0.4), but became more reliable as autocorrelation increased further. The daily data of RTIs exhibited significant autocorrelation after controlling for potential confounding, and therefore the MSRC was preferable to the UB-corrected. The intervention contributed to a decrease in the incidence of RTIs by 8.34% (95% CI, -5.69-20.51%), 45.07% (95% CI, 25.86-59.30%) and 42.94% (95% CI, 9.56-64.00%) at 1, 3 and 5 years after the implementation of the intervention, respectively. CONCLUSIONS: The proposed MSRC method provides a reliable and consistent approach for modelling parameter-driven time series with autocorrelated count data. It offers improved estimation compared to existing methods. The strict drunk driving regulations can reduce the risk of RTIs.


Assuntos
Fatores de Tempo , Humanos , Modelos Lineares , Simulação por Computador , Viés , China
14.
Stat Med ; 43(9): 1688-1707, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38373827

RESUMO

As one of the most commonly used data types, methods in testing or designing a trial for binary endpoints from two independent populations are still being developed until recently. However, the power and the minimum required sample size comparisons between different tests may not be valid if their type I errors are not controlled at the same level. In this article, we unify all related testing procedures into a decision framework, including both frequentist and Bayesian methods. Sufficient conditions of the type I error attained at the boundary of hypotheses are derived, which help reduce the magnitude of the exact calculations and lay out a foundation for developing computational algorithms to correctly specify the actual type I error. The efficient algorithms are thus proposed to calculate the cutoff value in a deterministic decision rule and the probability value in a randomized decision rule, such that the actual type I error is under but closest to, or equal to, the intended level, respectively. The algorithm may also be used to calculate the sample size to achieve the prespecified type I error and power. The usefulness of the proposed methodology is further demonstrated in the power calculation for designing superiority and noninferiority trials.


Assuntos
Algoritmos , Projetos de Pesquisa , Humanos , Teorema de Bayes , Tamanho da Amostra , Probabilidade
15.
Clin Trials ; 21(2): 171-179, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38311901

RESUMO

BACKGROUND: Pivotal evidence of efficacy of a new drug is typically generated by (at least) two clinical trials which independently provide statistically significant and mutually corroborating evidence of efficacy based on a primary endpoint. In this situation, showing drug effects on clinically important secondary objectives can be demanding in terms of sample size requirements. Statistically efficient methods to power for such endpoints while controlling the Type I error are needed. METHODS: We review existing strategies for establishing claims on important but sample size-intense secondary endpoints. We present new strategies based on combined data from two independent, identically designed and concurrent trials, controlling the Type I error at the submission level. We explain the methodology and provide three case studies. RESULTS: Different strategies have been used for establishing secondary claims. One new strategy, involving a protocol planned analysis of combined data across trials, and controlling the Type I error at the submission level, is particularly efficient. It has already been successfully used in support of label claims. Regulatory views on this strategy differ. CONCLUSIONS: Inference on combined data across trials is a useful approach for generating pivotal evidence of efficacy for important but sample size-intense secondary endpoints. It requires careful preparation and regulatory discussion.


Assuntos
Projetos de Pesquisa , Humanos , Tamanho da Amostra
16.
J Appl Stat ; 51(3): 481-496, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38370269

RESUMO

In this note, we evaluated the type I error control of the commonly used t-test found in most statistical software packages for testing the hypothesis on H0:ρ=0 vs. H1:ρ>0 based on the sample weighted Pearson correlation coefficient. We found the type I error rate is severely inflated in general cases, even under bivariate normality. To address this issue, we derived the large sample variance of the weighted Pearson correlation. Based on this result, we proposed an asymptotic test and a set of studentized permutation tests. A comprehensive set of simulation studies with a range of sample sizes and a variety of underlying distributions were conducted. The studentized permutation test based on Fisher's Z statistic was shown to robustly control the type I error even in the small sample and non-normality settings. The method was demonstrated with an example data of country-level preterm birth rates.

18.
Biom J ; 66(1): e2200312, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38285403

RESUMO

To accelerate a randomized controlled trial, historical control data may be used after ensuring little heterogeneity between the historical and current trials. The test-then-pool approach is a simple frequentist borrowing method that assesses the similarity between historical and current control data using a two-sided test. A limitation of the conventional test-then-pool method is the inability to control the type I error rate and power for the primary hypothesis separately and flexibly for heterogeneity between trials. This is because the two-sided test focuses on the absolute value of the mean difference between the historical and current controls. In this paper, we propose a new test-then-pool method that splits the two-sided hypothesis of the conventional method into two one-sided hypotheses. Testing each one-sided hypothesis with different significance levels allows for the separate control of the type I error rate and power for heterogeneity between trials. We also propose a significance-level selection approach based on the maximum type I error rate and the minimum power. The proposed method prevented a decrease in power even when there was heterogeneity between trials while controlling type I error at a maximum tolerable type I error rate larger than the targeted type I error rate. The application of depression trial data and hypothetical trial data further supported the usefulness of the proposed method.

19.
Neuro Oncol ; 26(5): 796-810, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38254183

RESUMO

BACKGROUND: Randomized controlled trials have been the gold standard for evaluating medical treatments for many decades but they are often criticized for requiring large sample sizes. Given the urgent need for better therapies for glioblastoma, it has been argued that data collected from patients treated with the standard regimen can provide high-quality external control data to supplement or replace concurrent control arm in future glioblastoma trials. METHODS: In this article, we provide an in-depth appraisal of the use of external control data in the context of neuro-oncology trials. We describe several clinical trial designs with particular attention to how external information is utilized and address common fallacies that may lead to inappropriate adoptions of external control data. RESULTS: Using 2 completed glioblastoma trials, we illustrate the use of an assessment tool that lays out a blueprint for assembling a high-quality external control data set. Using statistical simulations, we draw caution from scenarios where these approaches can fall short on controlling the type I error rate. CONCLUSIONS: While this approach may hold promise in generating informative data in certain settings, this sense of optimism should be tampered with a healthy dose of skepticism due to a myriad of design and analysis challenges articulated in this review. Importantly, careful planning is key to its successful implementation.


Assuntos
Neoplasias Encefálicas , Glioblastoma , Projetos de Pesquisa , Humanos , Projetos de Pesquisa/normas , Neoplasias Encefálicas/terapia , Glioblastoma/terapia , Ensaios Clínicos como Assunto/normas , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos
20.
Biom J ; 66(1): e2200102, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36642800

RESUMO

When comparing the performance of two or more competing tests, simulation studies commonly focus on statistical power. However, if the size of the tests being compared are either different from one another or from the nominal size, comparing tests based on power alone may be misleading. By analogy with diagnostic accuracy studies, we introduce relative positive and negative likelihood ratios to factor in both power and size in the comparison of multiple tests. We derive sample size formulas for a comparative simulation study. As an example, we compared the performance of six statistical tests for small-study effects in meta-analyses of randomized controlled trials: Begg's rank correlation, Egger's regression, Schwarzer's method for sparse data, the trim-and-fill method, the arcsine-Thompson test, and Lin and Chu's combined test. We illustrate that comparing power alone, or power adjusted or penalized for size, can be misleading, and how the proposed likelihood ratio approach enables accurate comparison of the trade-off between power and size between competing tests.


Assuntos
Viés de Publicação , Simulação por Computador , Tamanho da Amostra
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA