Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 151
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Am J Hum Genet ; 110(9): 1549-1563, 2023 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-37543033

RESUMEN

There is currently little evidence that the genetic basis of human phenotype varies significantly across the lifespan. However, time-to-event phenotypes are understudied and can be thought of as reflecting an underlying hazard, which is unlikely to be constant through life when values take a broad range. Here, we find that 74% of 245 genome-wide significant genetic associations with age at natural menopause (ANM) in the UK Biobank show a form of age-specific effect. Nineteen of these replicated discoveries are identified only by our modeling framework, which determines the time dependency of DNA-variant age-at-onset associations without a significant multiple-testing burden. Across the range of early to late menopause, we find evidence for significantly different underlying biological pathways, changes in the signs of genetic correlations of ANM to health indicators and outcomes, and differences in inferred causal relationships. We find that DNA damage response processes only act to shape ovarian reserve and depletion for women of early ANM. Genetically mediated delays in ANM were associated with increased relative risk of breast cancer and leiomyoma at all ages and with high cholesterol and heart failure for late-ANM women. These findings suggest that a better understanding of the age dependency of genetic risk factor relationships among health indicators and outcomes is achievable through appropriate statistical modeling of large-scale biobank data.


Asunto(s)
Envejecimiento , Menopausia , Humanos , Femenino , Envejecimiento/genética , Menopausia/genética , Edad de Inicio , Ovario , Factores de Riesgo , Factores de Edad
2.
Br J Anaesth ; 132(1): 116-123, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38030552

RESUMEN

BACKGROUND: The American Statistical Association has highlighted problems with null hypothesis significance testing and outlined alternative approaches that may 'supplement or even replace P-values'. One alternative is to report the false positive risk (FPR), which quantifies the chance the null hypothesis is true when the result is statistically significant. METHODS: We reviewed single-centre, randomised trials in 10 anaesthesia journals over 6 yr where differences in a primary binary outcome were statistically significant. We calculated a Bayes factor by two methods (Gunel, Kass). From the Bayes factor we calculated the FPR for different prior beliefs for a real treatment effect. Prior beliefs were quantified by assigning pretest probabilities to the null and alternative hypotheses. RESULTS: For equal pretest probabilities of 0.5, the median (inter-quartile range [IQR]) FPR was 6% (1-22%) by the Gunel method and 6% (1-19%) by the Kass method. One in five trials had an FPR ≥20%. For trials reporting P-values 0.01-0.05, the median (IQR) FPR was 25% (16-30%) by the Gunel method and 20% (16-25%) by the Kass method. More than 90% of trials reporting P-values 0.01-0.05 required a pretest probability >0.5 to achieve an FPR of 5%. The median (IQR) difference in the FPR calculated by the two methods was 0% (0-2%). CONCLUSIONS: Our findings suggest that a substantial proportion of single-centre trials in anaesthesia reporting statistically significant differences provide limited evidence of real treatment effects, or, alternatively, required an implausibly high prior belief in a real treatment effect. CLINICAL TRIAL REGISTRATION: PROSPERO (CRD42023350783).


Asunto(s)
Anestesia , Anestesiología , Humanos , Teorema de Bayes , Interpretación Estadística de Datos , Proyectos de Investigación
3.
BMC Biol ; 21(1): 116, 2023 05 23.
Artículo en Inglés | MEDLINE | ID: mdl-37217976

RESUMEN

Canadian policymakers are interested in determining whether farmed Atlantic salmon, frequently infected with Piscine orthoreovirus (PRV), may threaten wild salmon populations in the Pacific Northwest. A relevant work has been published in BMC Biology by Polinksi and colleagues, but their conclusion that PRV has a negligible impact on the energy expenditure and respiratory performance of sockeye salmon is disputed by Mordecai and colleagues, whose re-analysis is presented in a correspondence article. So, what is the true effect and what should follow this unresolved dispute? We suggest a 'registered multi-lab replication with adversaries'.


Asunto(s)
Infecciones por Reoviridae , Animales , Infecciones por Reoviridae/virología , Disentimientos y Disputas , Canadá , Salmón
4.
Behav Res Methods ; 56(2): 826-845, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36869217

RESUMEN

Statistical methods generally have assumptions (e.g., normality in linear regression models). Violations of these assumptions can cause various issues, like statistical errors and biased estimates, whose impact can range from inconsequential to critical. Accordingly, it is important to check these assumptions, but this is often done in a flawed way. Here, I first present a prevalent but problematic approach to diagnostics-testing assumptions using null hypothesis significance tests (e.g., the Shapiro-Wilk test of normality). Then, I consolidate and illustrate the issues with this approach, primarily using simulations. These issues include statistical errors (i.e., false positives, especially with large samples, and false negatives, especially with small samples), false binarity, limited descriptiveness, misinterpretation (e.g., of p-value as an effect size), and potential testing failure due to unmet test assumptions. Finally, I synthesize the implications of these issues for statistical diagnostics, and provide practical recommendations for improving such diagnostics. Key recommendations include maintaining awareness of the issues with assumption tests (while recognizing they can be useful), using appropriate combinations of diagnostic methods (including visualization and effect sizes) while recognizing their limitations, and distinguishing between testing and checking assumptions. Additional recommendations include judging assumption violations as a complex spectrum (rather than a simplistic binary), using programmatic tools that increase replicability and decrease researcher degrees of freedom, and sharing the material and rationale involved in the diagnostics.


Asunto(s)
Visualización de Datos , Modelos Lineales
5.
Behav Res Methods ; 2024 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-38886305

RESUMEN

Recently, Asparouhov and Muthén Structural Equation Modeling: A Multidisciplinary Journal, 28, 1-14, (2021a, 2021b) proposed a variant of the Wald test that uses Markov chain Monte Carlo machinery to generate a chi-square test statistic for frequentist inference. Because the test's composition does not rely on analytic expressions for sampling variation and covariation, it potentially provides a way to get honest significance tests in cases where the likelihood-based test statistic's assumptions break down (e.g., in small samples). The goal of this study is to use simulation to compare the new MCM Wald test to its maximum likelihood counterparts, with respect to both their type I error rate and power. Our simulation examined the test statistics across different levels of sample size, effect size, and degrees of freedom (test complexity). An additional goal was to assess the robustness of the MCMC Wald test with nonnormal data. The simulation results uniformly demonstrated that the MCMC Wald test was superior to the maximum likelihood test statistic, especially with small samples (e.g., sample sizes less than 150) and complex models (e.g., models with five or more predictors). This conclusion held for nonnormal data as well. Lastly, we provide a brief application to a real data example.

6.
Entropy (Basel) ; 26(6)2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38920515

RESUMEN

Information-theoretic (IT) and multi-model averaging (MMA) statistical approaches are widely used but suboptimal tools for pursuing a multifactorial approach (also known as the method of multiple working hypotheses) in ecology. (1) Conceptually, IT encourages ecologists to perform tests on sets of artificially simplified models. (2) MMA improves on IT model selection by implementing a simple form of shrinkage estimation (a way to make accurate predictions from a model with many parameters relative to the amount of data, by "shrinking" parameter estimates toward zero). However, other shrinkage estimators such as penalized regression or Bayesian hierarchical models with regularizing priors are more computationally efficient and better supported theoretically. (3) In general, the procedures for extracting confidence intervals from MMA are overconfident, providing overly narrow intervals. If researchers want to use limited data sets to accurately estimate the strength of multiple competing ecological processes along with reliable confidence intervals, the current best approach is to use full (maximal) statistical models (possibly with Bayesian priors) after making principled, a priori decisions about model complexity.

7.
Anaesthesia ; 78(1): 73-80, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36128627

RESUMEN

Are the results of randomised trials reliable and are p values and confidence intervals the best way of quantifying efficacy? Low power is common in medical research, which reduces the probability of obtaining a 'significant result' and declaring the intervention had an effect. Metrics derived from Bayesian methods may provide an insight into trial data unavailable from p values and confidence intervals. We did a structured review of multicentre trials in anaesthesia that were published in the New England Journal of Medicine, The Lancet, Journal of the American Medical Association, British Journal of Anaesthesia and Anesthesiology between February 2011 and November 2021. We documented whether trials declared a non-zero effect by an intervention on the primary outcome. We documented the expected and observed effect sizes. We calculated a Bayes factor from the published trial data indicating the probability of the data under the null hypothesis of zero effect relative to the alternative hypothesis of a non-zero effect. We used the Bayes factor to calculate the post-test probability of zero effect for the intervention (having assumed 50% belief in zero effect before the trial). We contacted all authors to estimate the costs of running the trials. The median (IQR [range]) hypothesised and observed absolute effect sizes were 7% (3-13% [0-25%]) vs. 2% (1-7% [0-24%]), respectively. Non-zero effects were declared for 12/56 outcomes (21%). The Bayes factor favouring a zero effect relative to a non-zero effect for these 12 trials was 0.000001-1.9, with post-test zero effect probabilities for the intervention of 0.0001-65%. The other 44 trials did not declare non-zero effects, with Bayes factors favouring zero effect of 1-688, and post-test probabilities of zero effect of 53-99%. The median (IQR [range]) study costs reported by 20 corresponding authors in US$ were $1,425,669 ($514,766-$2,526,807 [$120,758-$24,763,921]). We think that inadequate power and mortality as an outcome are why few trials declared non-zero effects. Bayes factors and post-test probabilities provide a useful insight into trial results, particularly when p values approximate the significance threshold.


Asunto(s)
Anestesia , Estados Unidos , Humanos , Teorema de Bayes
8.
Proc Natl Acad Sci U S A ; 117(11): 5559-5567, 2020 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-32127477

RESUMEN

The perceived replication crisis and the reforms designed to address it are grounded in the notion that science is a binary signal detection problem. However, contrary to null hypothesis significance testing (NHST) logic, the magnitude of the underlying effect size for a given experiment is best conceptualized as a random draw from a continuous distribution, not as a random draw from a dichotomous distribution (null vs. alternative). Moreover, because continuously distributed effects selected using a P < 0.05 filter must be inflated, the fact that they are smaller when replicated (reflecting regression to the mean) is no reason to sound the alarm. Considered from this perspective, recent replication efforts suggest that most published P < 0.05 scientific findings are "true" (i.e., in the correct direction), with observed effect sizes that are inflated to varying degrees. We propose that original science is a screening process, one that adopts NHST logic as a useful fiction for selecting true effects that are potentially large enough to be of interest to other scientists. Unlike original science, replication science seeks to precisely measure the underlying effect size associated with an experimental protocol via large-N direct replication, without regard for statistical significance. Registered reports are well suited to (often resource-intensive) direct replications, which should focus on influential findings and be published regardless of outcome. Conceptual replications play an important but separate role in validating theories. However, because they are part of NHST-based original science, conceptual replications cannot serve as the field's self-correction mechanism. Only direct replications can do that.

9.
Acta Obstet Gynecol Scand ; 101(6): 624-627, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35451497

RESUMEN

Traditional null hypothesis significance testing (NHST) incorporating the critical level of significance of 0.05 has become the cornerstone of decision-making in health care, and nowhere less so than in obstetric and gynecological research. However, such practice is controversial. In particular, it was never intended for clinical significance to be inferred from statistical significance. The inference of clinical importance based on statistical significance (p < 0.05), and lack of clinical significance otherwise (p ≥ 0.05) represents misunderstanding of the original purpose of NHST. Furthermore, the limitations of NHST-sensitivity to sample size, plus type I and II errors-are frequently ignored. Therefore, decision-making based on NHST has the potential for recurrent false claims about the effectiveness of interventions or importance of exposure to risk factors, or dismissal of important ones. This commentary presents the history behind NHST along with the limitations that modern-day NHST presents, and suggests that a statistics reform regarding NHST be considered.


Asunto(s)
Proyectos de Investigación , Humanos , Tamaño de la Muestra
10.
J Phys D Appl Phys ; 55(32)2022 Aug 11.
Artículo en Inglés | MEDLINE | ID: mdl-35726230

RESUMEN

Estimating statistical significance of the difference between two spectra or series is a fundamental statistical problem. Multivariate significance tests exist but the limitations preclude their use in many common cases; e.g., one-sided testing, unequal variance and when few repetitions are acquired all of which are required in magnetic spectroscopy of nanoparticle Brownian motion (MSB). We introduce a test, termed the T-S test, that is powerful and exact (exact type I error). It is flexible enough to be one- or two-sided and the one-sided version can specify arbitrary regions where each spectrum should be larger. The T-S test takes the-one or two-sided p-value at each frequency and combines them using Stouffer's method. We evaluated it using simulated spectra and measured MSB spectra. For the single-sided version, mean of the spectrum, A-T, was used as a reference; the T-S test is as powerful when the variance at each frequency is uniform and outperforms when the noise power is not uniform. For the two-sided version, the Hotelling T2 two-sided multivariate test was used as a reference; the two-sided T-S test is only slightly less powerful for large numbers of repetitions and outperforms rather dramatically for small numbers of repetitions. The T-S test was used to estimate the sensitivity of our current MSB spectrometer showing 1 nanogram sensitivity. Using eight repetitions the T-S test allowed 15 pM concentrations of mouse IL-6 to be identified while the mean of the spectra only identified 76 pM.

11.
J Extra Corpor Technol ; 54(4): 324-329, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-36742025

RESUMEN

In this article, I discuss the potential pitfalls of interpreting p values, confidence intervals, and declarations of statistical significance. To illustrate the issues, I discuss the LOVIT trial, which compared high-dose vitamin C with placebo in mechanically ventilated patients with sepsis. The primary outcome - the proportion of patients who died or had persisting organ dysfunction at day 28 - was significantly higher in patients who received vitamin C (p = .01). The authors had hypothesized that vitamin C would have a beneficial effect, although the prior evidence for benefit was weak. There was no prior evidence for a harmful effect of high-dose vitamin C. Consequently, the pretest probability for harm was low. The sample size was calculated assuming a 10% absolute risk difference, which was optimistic. Overestimating the effect size when calculating the sample size leads to low power. For these reasons, we should be skeptical that vitamin C causes harm in septic patients, despite the significant result. p-values and confidence intervals are probabilities concerning the chance of obtaining the observed data. However, we are more interested in the chance the intervention has a real effect on the outcome. That is to say, we are more interested in whether the hypothesis is true. A Bayesian approach allows us to estimate the false positive risk, which is the post-test probability there is no effect of the intervention. The false positive risk for the LOVIT trial (calculated from the published summary data using uniform priors for the parameter values) is 70%. Most likely, high-dose vitamin C does not cause harm in septic patients. Most likely it has no effect at all. If there is an effect, it is probably small and most likely beneficial.


Asunto(s)
Ácido Ascórbico , Sepsis , Humanos , Ácido Ascórbico/uso terapéutico , Teorema de Bayes , Sepsis/diagnóstico , Sepsis/tratamiento farmacológico
12.
Behav Res Methods ; 54(3): 1114-1130, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-34471963

RESUMEN

Hypothesis testing is a central statistical method in psychology and the cognitive sciences. However, the problems of null hypothesis significance testing (NHST) and p values have been debated widely, but few attractive alternatives exist. This article introduces the fbst R package, which implements the Full Bayesian Significance Test (FBST) to test a sharp null hypothesis against its alternative via the e value. The statistical theory of the FBST has been introduced more than two decades ago and since then the FBST has shown to be a Bayesian alternative to NHST and p values with both theoretical and practical highly appealing properties. The algorithm provided in the fbst package is applicable to any Bayesian model as long as the posterior distribution can be obtained at least numerically. The core function of the package provides the Bayesian evidence against the null hypothesis, the e value. Additionally, p values based on asymptotic arguments can be computed and rich visualizations for communication and interpretation of the results can be produced. Three examples of frequently used statistical procedures in the cognitive sciences are given in this paper, which demonstrate how to apply the FBST in practice using the fbst package. Based on the success of the FBST in statistical science, the fbst package should be of interest to a broad range of researchers and hopefully will encourage researchers to consider the FBST as a possible alternative when conducting hypothesis tests of a sharp null hypothesis.


Asunto(s)
Proyectos de Investigación , Teorema de Bayes , Humanos
13.
Neuroimage ; 236: 118052, 2021 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-33857618

RESUMEN

Technological and data sharing advances have led to a proliferation of high-resolution structural and functional maps of the brain. Modern neuroimaging research increasingly depends on identifying correspondences between the topographies of these maps; however, most standard methods for statistical inference fail to account for their spatial properties. Recently, multiple methods have been developed to generate null distributions that preserve the spatial autocorrelation of brain maps and yield more accurate statistical estimates. Here, we comprehensively assess the performance of ten published null frameworks in statistical analyses of neuroimaging data. To test the efficacy of these frameworks in situations with a known ground truth, we first apply them to a series of controlled simulations and examine the impact of data resolution and spatial autocorrelation on their family-wise error rates. Next, we use each framework with two empirical neuroimaging datasets, investigating their performance when testing (1) the correspondence between brain maps (e.g., correlating two activation maps) and (2) the spatial distribution of a feature within a partition (e.g., quantifying the specificity of an activation map within an intrinsic functional network). Finally, we investigate how differences in the implementation of these null models may impact their performance. In agreement with previous reports, we find that naive null models that do not preserve spatial autocorrelation consistently yield elevated false positive rates and unrealistically liberal statistical estimates. While spatially-constrained null models yielded more realistic, conservative estimates, even these frameworks suffer from inflated false positive rates and variable performance across analyses. Throughout our results, we observe minimal impact of parcellation and resolution on null model performance. Altogether, our findings highlight the need for continued development of statistically-rigorous methods for comparing brain maps. The present report provides a harmonised framework for benchmarking and comparing future advancements.


Asunto(s)
Mapeo Encefálico/métodos , Modelos Estadísticos , Red Nerviosa/diagnóstico por imagen , Conectoma , Humanos
14.
Hum Brain Mapp ; 42(18): 5803-5813, 2021 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-34529303

RESUMEN

Null hypothesis significance testing is the major statistical procedure in fMRI, but provides only a rather limited picture of the effects in a data set. When sample size and power is low relying only on strict significance testing may lead to a host of false negative findings. In contrast, with very large data sets virtually every voxel might become significant. It is thus desirable to complement significance testing with procedures like inferiority and equivalence tests that allow to formally compare effect sizes within and between data sets and offer novel approaches to obtain insight into fMRI data. The major component of these tests are estimates of standardized effect sizes and their confidence intervals. Here, we show how Hedges' g, the bias corrected version of Cohen's d, and its confidence interval can be obtained from SPM t maps. We then demonstrate how these values can be used to evaluate whether nonsignificant effects are really statistically smaller than significant effects to obtain "regions of undecidability" within a data set, and to test for the replicability and lateralization of effects. This method allows the analysis of fMRI data beyond point estimates enabling researchers to take measurement uncertainty into account when interpreting their findings.


Asunto(s)
Encéfalo/diagnóstico por imagen , Encéfalo/fisiología , Interpretación Estadística de Datos , Neuroimagen Funcional , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Neuroimagen Funcional/métodos , Neuroimagen Funcional/normas , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Procesamiento de Imagen Asistido por Computador/normas , Imagen por Resonancia Magnética/métodos , Imagen por Resonancia Magnética/normas
15.
Stat Sci ; 36(4): 562-577, 2021 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37860618

RESUMEN

A great deal of interest has recently focused on conducting inference on the parameters in a high-dimensional linear model. In this paper, we consider a simple and very naïve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables, and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the resulting least squares model (such as confidence intervals and p-values), since we peeked at the data twice: once in running the lasso, and again in fitting the least squares model. However, in this paper, we show that under a certain set of assumptions, with high probability, the set of variables selected by the lasso is identical to the one selected by the noiseless lasso and is hence deterministic. Consequently, the naïve two-step approach can yield asymptotically valid inference. We utilize this finding to develop the naïve confidence interval, which can be used to draw inference on the regression coefficients of the model selected by the lasso, as well as the naïve score test, which can be used to test the hypotheses regarding the full-model regression coefficients.

16.
Paediatr Perinat Epidemiol ; 35(1): 8-23, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33269490

RESUMEN

The "replication crisis" has been attributed to perverse incentives that lead to selective reporting and misinterpretations of P-values and confidence intervals. A crude fix offered for this problem is to lower testing cut-offs (α levels), either directly or in the form of null-biased multiple comparisons procedures such as naïve Bonferroni adjustments. Methodologists and statisticians have expressed positions that range from condemning all such procedures to demanding their application in almost all analyses. Navigating between these unjustifiable extremes requires defining analysis goals precisely enough to separate inappropriate from appropriate adjustments. To meet this need, I here review issues arising in single-parameter inference (such as error costs and loss functions) that are often skipped in basic statistics, yet are crucial to understanding controversies in testing and multiple comparisons. I also review considerations that should be made when examining arguments for and against modifications of decision cut-offs and adjustments for multiple comparisons. The goal is to provide researchers a better understanding of what is assumed by each side and to enable recognition of hidden assumptions. Basic issues of goal specification and error costs are illustrated with simple fixed cut-off hypothesis testing scenarios. These illustrations show how adjustment choices are extremely sensitive to implicit decision costs, making it inevitable that different stakeholders will vehemently disagree about what is necessary or appropriate. Because decisions cannot be justified without explicit costs, resolution of inference controversies is impossible without recognising this sensitivity. Pre-analysis statements of funding, scientific goals, and analysis plans can help counter demands for inappropriate adjustments, and can provide guidance as to what adjustments are advisable. Hierarchical (multilevel) regression methods (including Bayesian, semi-Bayes, and empirical-Bayes methods) provide preferable alternatives to conventional adjustments, insofar as they facilitate use of background information in the analysis model, and thus can provide better-informed estimates on which to base inferences and decisions.


Asunto(s)
Objetivos , Proyectos de Investigación , Teorema de Bayes , Humanos
17.
J Card Surg ; 36(11): 4322-4331, 2021 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-34477260

RESUMEN

Null hypothesis significance testing (NHST) and p-values are widespread in the cardiac surgical literature but are frequently misunderstood and misused. The purpose of the review is to discuss major disadvantages of p-values and suggest alternatives. We describe diagnostic tests, the prosecutor's fallacy in the courtroom, and NHST, which involve inter-related conditional probabilities, to help clarify the meaning of p-values, and discuss the enormous sampling variability, or unreliability, of p-values. Finally, we use a cardiac surgical database and simulations to explore further issues involving p-values. In clinical studies, p-values provide a poor summary of the observed treatment effect, whereas the three-number summary provided by effect estimates and confidence intervals is more informative and minimizes over-interpretation of a "significant" result. p-values are an unreliable measure of the strength of evidence; if used at all they give only, at best, a very rough guide to decision making. Researchers should adopt Open Science practices to improve the trustworthiness of research and, where possible, use estimation (three-number summaries) or other better techniques.


Asunto(s)
Proyectos de Investigación , Teorema de Bayes , Humanos , Probabilidad
18.
Pharm Stat ; 20(3): 610-644, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33565236

RESUMEN

Sample size calculation is an essential component of the planning phase of a clinical trial. In the context of single-arm clinical trials with time-to-event (TTE) endpoints, only a few options with limited design features are available. Motivated from ethical or practical considerations, two-stage designs are implemented for single-arm studies to obtain early evidence of futility. A major drawback of such designs is that early stopping may only occur at the conclusion of the first stage, even if lack of efficacy becomes apparent at any other time point over the course of the clinical trial. In this manuscript, we attempt to fill some existing gaps in the literature related to single-arm clinical trials with TTE endpoints. We propose a parametric maximum likelihood estimate-based test whose variance component accounts for the expected proportion of loss to follow-up and different accrual patterns (early, late, or uniform accrual). For the proposed method, we present three stochastic curtailment methods (conditional power, predictive power, Bayesian predictive probability) which can be employed for efficacy or futility testing purposes. Finally, we discuss the implementation of group sequential designs for obtaining an early evidence of efficacy or futility at pre-planned timings of interim analyses. Through extensive simulations, it is shown that our proposed method performs well for designing these studies with moderate to large sample sizes. Some examples are presented to demonstrate various aspects of the stochastic curtailment and repeated significance testing methods presented in this manuscript.


Asunto(s)
Ensayos Clínicos como Asunto , Inutilidad Médica , Proyectos de Investigación , Teorema de Bayes , Humanos , Funciones de Verosimilitud , Tamaño de la Muestra
19.
BMC Med Res Methodol ; 20(1): 142, 2020 06 05.
Artículo en Inglés | MEDLINE | ID: mdl-32503439

RESUMEN

BACKGROUND: Although null hypothesis significance testing (NHST) is the agreed gold standard in medical decision making and the most widespread inferential framework used in medical research, it has several drawbacks. Bayesian methods can complement or even replace frequentist NHST, but these methods have been underutilised mainly due to a lack of easy-to-use software. JASP is an open-source software for common operating systems, which has recently been developed to make Bayesian inference more accessible to researchers, including the most common tests, an intuitive graphical user interface and publication-ready output plots. This article provides a non-technical introduction to Bayesian hypothesis testing in JASP by comparing traditional tests and statistical methods with their Bayesian counterparts. RESULTS: The comparison shows the strengths and limitations of JASP for frequentist NHST and Bayesian inference. Specifically, Bayesian hypothesis testing via Bayes factors can complement and even replace NHST in most situations in JASP. While p-values can only reject the null hypothesis, the Bayes factor can state evidence for both the null and the alternative hypothesis, making confirmation of hypotheses possible. Also, effect sizes can be precisely estimated in the Bayesian paradigm via JASP. CONCLUSIONS: Bayesian inference has not been widely used by now due to the dearth of accessible software. Medical decision making can be complemented by Bayesian hypothesis testing in JASP, providing richer information than single p-values and thus strengthening the credibility of an analysis. Through an easy point-and-click interface researchers used to other graphical statistical packages like SPSS can seemlessly transition to JASP and benefit from the listed advantages with only few limitations.


Asunto(s)
Investigación Biomédica , Proyectos de Investigación , Teorema de Bayes , Humanos , Programas Informáticos
20.
BMC Med Res Methodol ; 20(1): 167, 2020 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-32580765

RESUMEN

BACKGROUND: In medical research and practice, the p-value is arguably the most often used statistic and yet it is widely misconstrued as the probability of the type I error, which comes with serious consequences. This misunderstanding can greatly affect the reproducibility in research, treatment selection in medical practice, and model specification in empirical analyses. By using plain language and concrete examples, this paper is intended to elucidate the p-value confusion from its root, to explicate the difference between significance and hypothesis testing, to illuminate the consequences of the confusion, and to present a viable alternative to the conventional p-value. MAIN TEXT: The confusion with p-values has plagued the research community and medical practitioners for decades. However, efforts to clarify it have been largely futile, in part, because intuitive yet mathematically rigorous educational materials are scarce. Additionally, the lack of a practical alternative to the p-value for guarding against randomness also plays a role. The p-value confusion is rooted in the misconception of significance and hypothesis testing. Most, including many statisticians, are unaware that p-values and significance testing formed by Fisher are incomparable to the hypothesis testing paradigm created by Neyman and Pearson. And most otherwise great statistics textbooks tend to cobble the two paradigms together and make no effort to elucidate the subtle but fundamental differences between them. The p-value is a practical tool gauging the "strength of evidence" against the null hypothesis. It informs investigators that a p-value of 0.001, for example, is stronger than 0.05. However, p-values produced in significance testing are not the probabilities of type I errors as commonly misconceived. For a p-value of 0.05, the chance a treatment does not work is not 5%; rather, it is at least 28.9%. CONCLUSIONS: A long-overdue effort to understand p-values correctly is much needed. However, in medical research and practice, just banning significance testing and accepting uncertainty are not enough. Researchers, clinicians, and patients alike need to know the probability a treatment will or will not work. Thus, the calibrated p-values (the probability that a treatment does not work) should be reported in research papers.


Asunto(s)
Investigación Biomédica , Proyectos de Investigación , Humanos , Probabilidad , Reproducibilidad de los Resultados , Investigadores
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA