RESUMEN
Out of the 166 articles published in Journal of Industrial Microbiology and Biotechnology (JIMB) in 2019-2020 (not including special issues or review articles), 51 of them used a statistical test to compare two or more means. The most popular test was the (Standard) t-test, which often was used to compare several pairs of means. Other statistical procedures used included Fisher's least significant difference (LSD), Tukey's honest significant difference (HSD), and Welch's t-test; and to a lesser extent Bonferroni, Duncan's Multiple Range, Student-Newman-Keuls, and Kruskal-Wallis tests. This manuscript examines the performance of some of these tests with simulated experimental data, typical of those reported by JIMB authors. The results show that many of the most common procedures used by JIMB authors result in statistical conclusions that are prone to have large false positive (Type I) errors. These error-prone procedures included the multiple t-test, multiple Welch's t-test, and Fisher's LSD. These multiple comparisons procedures were compared with alternatives (Fisher-Hayter, Tukey's HSD, Bonferroni, and Dunnett's t-test) that were able to better control Type I errors. NON-TECHNICAL SUMMARY: The aim of this work was to review and recommend statistical procedures for Journal of Industrial Microbiology and Biotechnology authors who often compare the effect of several treatments on microorganisms and their functions.
Asunto(s)
Microbiología Industrial , Publicaciones Periódicas como AsuntoRESUMEN
P-values that are derived from continuously distributed test statistics are typically uniformly distributed on (0,1) under least favorable parameter configurations (LFCs) in the null hypothesis. Conservativeness of a p-value P (meaning that P is under the null hypothesis stochastically larger than uniform on (0,1)) can occur if the test statistic from which P is derived is discrete, or if the true parameter value under the null is not an LFC. To deal with both of these sources of conservativeness, we present two approaches utilizing randomized p-values. We illustrate their effectiveness for testing a composite null hypothesis under a binomial model. We also give an example of how the proposed p-values can be used to test a composite null in group testing designs. We find that the proposed randomized p-values are less conservative compared to nonrandomized p-values under the null hypothesis, but that they are stochastically not smaller under the alternative. The problem of establishing the validity of randomized p-values has received attention in previous literature. We show that our proposed randomized p-values are valid under various discrete statistical models, which are such that the distribution of the corresponding test statistic belongs to an exponential family. The behavior of the power function for the tests based on the proposed randomized p-values as a function of the sample size is also investigated. Simulations and a real data example are used to compare the different considered p-values.
Asunto(s)
Modelos Estadísticos , Tamaño de la MuestraRESUMEN
In biomedical research, the simultaneous inference of multiple binary endpoints may be of interest. In such cases, an appropriate multiplicity adjustment is required that controls the family-wise error rate, which represents the probability of making incorrect test decisions. In this paper, we investigate two approaches that perform single-step p $p$ -value adjustments that also take into account the possible correlation between endpoints. A rather novel and flexible approach known as multiple marginal models is considered, which is based on stacking of the parameter estimates of the marginal models and deriving their joint asymptotic distribution. We also investigate a nonparametric vector-based resampling approach, and we compare both approaches with the Bonferroni method by examining the family-wise error rate and power for different parameter settings, including low proportions and small sample sizes. The results show that the resampling-based approach consistently outperforms the other methods in terms of power, while still controlling the family-wise error rate. The multiple marginal models approach, on the other hand, shows a more conservative behavior. However, it offers more versatility in application, allowing for more complex models or straightforward computation of simultaneous confidence intervals. The practical application of the methods is demonstrated using a toxicological dataset from the National Toxicology Program.
Asunto(s)
Investigación Biomédica , Biometría , Modelos Estadísticos , Biometría/métodos , Investigación Biomédica/métodos , Tamaño de la Muestra , Determinación de Punto Final , HumanosRESUMEN
Hung et al. (2007) considered the problem of controlling the type I error rate for a primary and secondary endpoint in a clinical trial using a gatekeeping approach in which the secondary endpoint is tested only if the primary endpoint crosses its monitoring boundary. They considered a two-look trial and showed by simulation that the naive method of testing the secondary endpoint at full level α at the time the primary endpoint reaches statistical significance does not control the familywise error rate at level α. Tamhane et al. (2010) derived analytic expressions for familywise error rate and power and confirmed the inflated error rate of the naive approach. Nonetheless, many people mistakenly believe that the closure principle can be used to prove that the naive procedure controls the familywise error rate. The purpose of this note is to explain in greater detail why there is a problem with the naive approach and show that the degree of alpha inflation can be as high as that of unadjusted monitoring of a single endpoint.
Asunto(s)
Modelos Estadísticos , Proyectos de Investigación , Humanos , Determinación de Punto Final/métodos , Simulación por Computador , Tamaño de la MuestraRESUMEN
BACKGROUND: The rule of thumb that there is little gain in statistical power by obtaining more than 4 controls per case, is based on type-1 error α = 0.05. However, association studies that evaluate thousands or millions of associations use smaller α and may have access to plentiful controls. We investigate power gains, and reductions in p-values, when increasing well beyond 4 controls per case, for small α. METHODS: We calculate the power, the median expected p-value, and the minimum detectable odds-ratio (OR), as a function of the number of controls/case, as α decreases. RESULTS: As α decreases, at each ratio of controls per case, the increase in power is larger than for α = 0.05. For α between 10-6 and 10-9 (typical for thousands or millions of associations), increasing from 4 controls per case to 10-50 controls per case increases power. For example, a study with power = 0.2 (α = 5 × 10-8) with 1 control/case has power = 0.65 with 4 controls/case, but with 10 controls/case has power = 0.78, and with 50 controls/case has power = 0.84. For situations where obtaining more than 4 controls per case provides small increases in power beyond 0.9 (at small α), the expected p-value can decrease by orders-of-magnitude below α. Increasing from 1 to 4 controls/case reduces the minimum detectable OR toward the null by 20.9%, and from 4 to 50 controls/case reduces by an additional 9.7%, a result which applies regardless of α and hence also applies to "regular" α = 0.05 epidemiology. CONCLUSIONS: At small α, versus 4 controls/case, recruiting 10 or more controls/cases can increase power, reduce the expected p-value by 1-2 orders of magnitude, and meaningfully reduce the minimum detectable OR. These benefits of increasing the controls/case ratio increase as the number of cases increases, although the amount of benefit depends on exposure frequencies and true OR. Provided that controls are comparable to cases, our findings suggest greater sharing of comparable controls in large-scale association studies.
Asunto(s)
Grupos Control , Oportunidad Relativa , Proyectos de Investigación , HumanosRESUMEN
Ciprofloxacin (CFX) and ofloxacin (OFX) are commonly found as residual contaminants in aquatic environments, posing potential risks to various species. To ensure the safety of aquatic wildlife, it is essential to determine the toxicity of these antibiotics and establish appropriate concentration limits. Additionally, in (eco)toxicological studies, addressing the issue of multiple hypothesis testing through p-value adjustments is crucial for robust decision-making. In this study, we assessed the no observed adverse effect concentration (NOAEC) of CFX and OFX on Moina macrocopa across a concentration range of 0-400 µg L-1. Furthermore, we investigated multiple p-value adjustments to determine the NOAECs. Our analysis yielded consistent results across seven different p-value adjustments, indicating NOAECs of 100 µg CFX L-1 for age at first reproduction and 200 µg CFX L-1 for fertility. For OFX treatment, a NOAEC of 400 µg L-1 was observed for both biomarkers. However, further investigation is required to establish the NOAEC of OFX at higher concentrations with greater certainty. Our findings demonstrate that CFX exhibits higher toxicity compared to OFX, consistent with previous research. Moreover, this study highlights the differential performance of p-value adjustment methods in terms of maintaining statistical power while controlling the multiplicity problem, and their practical applicability. The study emphasizes the low NOAECs for these antibiotics in the zooplanktonic group, highlighting their significant risks to ecological and environmental safety. Additionally, our investigation of p-value adjustment approaches contributes to a deeper understanding of their performance characteristics, enabling (eco)toxicologists to select appropriate methods based on their specific needs and priorities.
RESUMEN
The cluster mass test has been widely used for massively univariate tests in M/EEG, fMRI and, recently, pupillometry analysis. It is a powerful method for detecting effects while controlling weakly the family-wise error rate (FWER), although its correct interpretation can only be performed at the cluster level without any point-wise conclusion. It implies that the discoveries of a cluster mass test cannot be precisely localized in time or in space. We propose a new multiple comparisons procedure, the cluster depth tests, that both controls the FWER while allowing an interpretation at the time point level. We show the conditions for a strong control of the FWER, and a simulation study shows that the cluster depth tests achieve large power and guarantee the FWER even in the presence of physiologically plausible effects. By having an interpretation at the time point/voxel level, the cluster depth tests make it possible to take full advantage of the high temporal resolution of EEG recording and give a precise timing of the start and end of the significant effects.
Asunto(s)
Encéfalo/fisiología , Electroencefalografía/métodos , Imagen por Resonancia Magnética/métodos , Modelos Estadísticos , Análisis por Conglomerados , Simulación por Computador , HumanosRESUMEN
When a ranking of institutions such as medical centers or universities is based on a numerical measure of performance provided with a standard error, confidence intervals (CIs) should be calculated to assess the uncertainty of these ranks. We present a novel method based on Tukey's honest significant difference test to construct simultaneous CIs for the true ranks. When all the true performances are equal, the probability of coverage of our method attains the nominal level. In case the true performance measures have no exact ties, our method is conservative. For this situation, we propose a rescaling method to the nominal level that results in shorter CIs while keeping control of the simultaneous coverage. We also show that a similar rescaling can be applied to correct a recently proposed Monte-Carlo based method, which is anticonservative. After rescaling, the two methods perform very similarly. However, the rescaling of the Monte-Carlo based method is computationally much more demanding and becomes infeasible when the number of institutions is larger than 30-50. We discuss another recently proposed method similar to ours based on simultaneous CIs for the true performance. We show that our method provides uniformly shorter CIs for the same confidence level. We illustrate the superiority of our new methods with a data analysis for travel time to work in the United States and on rankings of 64 hospitals in the Netherlands.
Asunto(s)
Hospitales , Proyectos de Investigación , Intervalos de Confianza , Método de Montecarlo , Probabilidad , Estados UnidosRESUMEN
Sequential, multiple assignment, randomized trials (SMARTs) compare sequences of treatment decision rules called dynamic treatment regimes (DTRs). In particular, the Adaptive Treatment for Alcohol and Cocaine Dependence (ENGAGE) SMART aimed to determine the best DTRs for patients with a substance use disorder. While many authors have focused on a single pairwise comparison, addressing the main goal involves comparisons of >2 DTRs. For complex comparisons, there is a paucity of methods for binary outcomes. We fill this gap by extending the multiple comparisons with the best (MCB) methodology to the Bayesian binary outcome setting. The set of best is constructed based on simultaneous credible intervals. A substantial challenge for power analysis is the correlation between outcome estimators for distinct DTRs embedded in SMARTs due to overlapping subjects. We address this using Robins' G-computation formula to take a weighted average of parameter draws obtained via simulation from the parameter posteriors. We use non-informative priors and work with the exact distribution of parameters avoiding unnecessary normality assumptions and specification of the correlation matrix of DTR outcome summary statistics. We conduct simulation studies for both the construction of a set of optimal DTRs using the Bayesian MCB procedure and the sample size calculation for two common SMART designs. We illustrate our method on the ENGAGE SMART. The R package SMARTbayesR for power calculations is freely available on the Comprehensive R Archive Network (CRAN) repository. An RShiny app is available at https://wilart.shinyapps.io/shinysmartbayesr/.
Asunto(s)
Proyectos de Investigación , Teorema de Bayes , Simulación por Computador , Humanos , Tamaño de la MuestraRESUMEN
This article proposes four new principles for logical biomarker cut-point selection methods to adhere to: subgroup sensibility, sensitivity, specificity, and target monotonicity. At every cut-point value, our method gives confidence intervals not only for the efficacy at that cut-point value, but also efficacies in the marker-positive and marker-negative subgroups defined by that cut-point. These confidence intervals are given simultaneously for all possible cut-point values. Using Alzheimer's disease (AD) and type 2 diabetes (T2DM) as examples, we show our method achieves the four principles. Our method strongly controls familywise type I error rate (FWER) across both levels of multiplicity: the multiplicity of having marker-positive and marker-negative subgroups at each cut-point, and the multiplicity of searching through infinitely many cut-points. This is in contrast to other available methods. The confidence level of our simultaneous confidence intervals is in fact exact (not conservative). An application (app) is available, which implements the method we propose.
Asunto(s)
Enfermedad de Alzheimer , Diabetes Mellitus Tipo 2 , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/genética , Biomarcadores , Intervalos de Confianza , Diabetes Mellitus Tipo 2/diagnóstico , Humanos , Proyectos de InvestigaciónRESUMEN
Much of the research on multiple comparison and simultaneous inference in the past 60 years or so has been for the comparisons of several population means. Spurrier seems to have been the first to investigate multiple comparisons of several simple linear regression lines using simultaneous confidence bands. In this paper, we extend the work of Liu et al. for finite comparisons of several univariate linear regression models using simultaneous confidence bands to finite comparisons of several multivariate linear regression models using simultaneous confidence tubes. We show how simultaneous confidence tubes can be constructed to allow more informative inferences for the comparison of several multivariate linear regression models than the current approach of hypotheses testing. The methods are illustrated with examples.
Asunto(s)
Modelos Estadísticos , Intervalos de Confianza , Modelos Lineales , Análisis MultivarianteRESUMEN
Sequential, multiple assignment, randomized trial (SMART) designs have become increasingly popular in the field of precision medicine by providing a means for comparing more than two sequences of treatments tailored to the individual patient, i.e., dynamic treatment regime (DTR). The construction of evidence-based DTRs promises a replacement to ad hoc one-size-fits-all decisions pervasive in patient care. However, there are substantial statistical challenges in sizing SMART designs due to the correlation structure between the DTRs embedded in the design (EDTR). Since a primary goal of SMARTs is the construction of an optimal EDTR, investigators are interested in sizing SMARTs based on the ability to screen out EDTRs inferior to the optimal EDTR by a given amount which cannot be done using existing methods. In this article, we fill this gap by developing a rigorous power analysis framework that leverages the multiple comparisons with the best methodology. Our method employs Monte Carlo simulation to compute the number of individuals to enroll in an arbitrary SMART. We evaluate our method through extensive simulation studies. We illustrate our method by retrospectively computing the power in the Extending Treatment Effectiveness of Naltrexone (EXTEND) trial. An R package implementing our methodology is available to download from the Comprehensive R Archive Network.
Asunto(s)
Investigación Biomédica , Modelos Estadísticos , Ensayos Clínicos Controlados Aleatorios como Asunto , Proyectos de Investigación , Investigación Biomédica/métodos , Investigación Biomédica/normas , Humanos , Método de Montecarlo , Naltrexona/farmacología , Evaluación de Resultado en la Atención de Salud/métodos , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Ensayos Clínicos Controlados Aleatorios como Asunto/normas , Proyectos de Investigación/normas , Tamaño de la MuestraRESUMEN
The "replication crisis" has been attributed to perverse incentives that lead to selective reporting and misinterpretations of P-values and confidence intervals. A crude fix offered for this problem is to lower testing cut-offs (α levels), either directly or in the form of null-biased multiple comparisons procedures such as naïve Bonferroni adjustments. Methodologists and statisticians have expressed positions that range from condemning all such procedures to demanding their application in almost all analyses. Navigating between these unjustifiable extremes requires defining analysis goals precisely enough to separate inappropriate from appropriate adjustments. To meet this need, I here review issues arising in single-parameter inference (such as error costs and loss functions) that are often skipped in basic statistics, yet are crucial to understanding controversies in testing and multiple comparisons. I also review considerations that should be made when examining arguments for and against modifications of decision cut-offs and adjustments for multiple comparisons. The goal is to provide researchers a better understanding of what is assumed by each side and to enable recognition of hidden assumptions. Basic issues of goal specification and error costs are illustrated with simple fixed cut-off hypothesis testing scenarios. These illustrations show how adjustment choices are extremely sensitive to implicit decision costs, making it inevitable that different stakeholders will vehemently disagree about what is necessary or appropriate. Because decisions cannot be justified without explicit costs, resolution of inference controversies is impossible without recognising this sensitivity. Pre-analysis statements of funding, scientific goals, and analysis plans can help counter demands for inappropriate adjustments, and can provide guidance as to what adjustments are advisable. Hierarchical (multilevel) regression methods (including Bayesian, semi-Bayes, and empirical-Bayes methods) provide preferable alternatives to conventional adjustments, insofar as they facilitate use of background information in the analysis model, and thus can provide better-informed estimates on which to base inferences and decisions.
Asunto(s)
Objetivos , Proyectos de Investigación , Teorema de Bayes , HumanosRESUMEN
The multiple testing problem arises not only when there are many voxels or vertices in an image representation of the brain, but also when multiple contrasts of parameter estimates (that represent hypotheses) are tested in the same general linear model. We argue that a correction for this multiplicity must be performed to avoid excess of false positives. Various methods for correction have been proposed in the literature, but few have been applied to brain imaging. Here we discuss and compare different methods to make such correction in different scenarios, showing that one classical and well known method is invalid, and argue that permutation is the best option to perform such correction due to its exactness and flexibility to handle a variety of common imaging situations.
Asunto(s)
Encéfalo/diagnóstico por imagen , Interpretación Estadística de Datos , Modelos Estadísticos , Neuroimagen/normas , Adulto , Humanos , Neuroimagen/métodos , Distancia Psicológica , Psicometría/instrumentaciónRESUMEN
BACKGROUND: Multi-arm designs provide an effective means of evaluating several treatments within the same clinical trial. Given the large number of treatments now available for testing in many disease areas, it has been argued that their utilisation should increase. However, for any given clinical trial there are numerous possible multi-arm designs that could be used, and choosing between them can be a difficult task. This task is complicated further by a lack of available easy-to-use software for designing multi-arm trials. RESULTS: To aid the wider implementation of multi-arm clinical trial designs, we have developed a web application for sample size calculation when using a variety of popular multiple comparison corrections. Furthermore, the application supports sample size calculation to control several varieties of power, as well as the determination of optimised arm-wise allocation ratios. It is built using the Shiny package in the R programming language, is free to access on any device with an internet browser, and requires no programming knowledge to use. It incorporates a variety of features to make it easier to use, including help boxes and warning messages. Using design parameters motivated by a recently completed phase II oncology trial, we demonstrate that the application can effectively determine and evaluate complex multi-arm trial designs. CONCLUSIONS: The application provides the core information required by statisticians and clinicians to review the operating characteristics of a chosen multi-arm clinical trial design. The range of designs supported by the application is broader than other currently available software solutions. Its primary limitation, particularly from a regulatory agency point of view, is its lack of validation. However, we present an approach to efficiently confirming its results via simulation.
Asunto(s)
Ensayos Clínicos como Asunto/métodos , Interpretación Estadística de Datos , Humanos , Proyectos de Investigación , Tamaño de la Muestra , Programas Informáticos , Navegador WebRESUMEN
BACKGROUND: Bayesian adaptive methods are increasingly being used to design clinical trials and offer several advantages over traditional approaches. Decisions at analysis points are usually based on the posterior distribution of the treatment effect. However, there is some confusion as to whether control of type I error is required for Bayesian designs as this is a frequentist concept. METHODS: We discuss the arguments for and against adjusting for multiplicities in Bayesian trials with interim analyses. With two case studies we illustrate the effect of including interim analyses on type I/II error rates in Bayesian clinical trials where no adjustments for multiplicities are made. We propose several approaches to control type I error, and also alternative methods for decision-making in Bayesian clinical trials. RESULTS: In both case studies we demonstrated that the type I error was inflated in the Bayesian adaptive designs through incorporation of interim analyses that allowed early stopping for efficacy and without adjustments to account for multiplicity. Incorporation of early stopping for efficacy also increased the power in some instances. An increase in the number of interim analyses that only allowed early stopping for futility decreased the type I error, but also decreased power. An increase in the number of interim analyses that allowed for either early stopping for efficacy or futility generally increased type I error and decreased power. CONCLUSIONS: Currently, regulators require demonstration of control of type I error for both frequentist and Bayesian adaptive designs, particularly for late-phase trials. To demonstrate control of type I error in Bayesian adaptive designs, adjustments to the stopping boundaries are usually required for designs that allow for early stopping for efficacy as the number of analyses increase. If the designs only allow for early stopping for futility then adjustments to the stopping boundaries are not needed to control type I error. If one instead uses a strict Bayesian approach, which is currently more accepted in the design and analysis of exploratory trials, then type I errors could be ignored and the designs could instead focus on the posterior probabilities of treatment effects of clinically-relevant values.
Asunto(s)
Inutilidad Médica , Proyectos de Investigación , Teorema de Bayes , Humanos , ProbabilidadRESUMEN
A clinical trial often has primary and secondary endpoints and comparisons of high and low doses of a study drug to a control. Multiplicity is not only caused by the multiple comparisons of study drugs versus the control, but also from the hierarchical structure of the hypotheses. Closed test procedures were proposed as general methods to address multiplicity. Two commonly used tests for intersection hypotheses in closed test procedures are the Simes test and the average method. When the treatment effect of a less efficacious dose is not much smaller than the treatment effect of a more efficacious dose for a specific endpoint, the average method has better power than the Simes test for the comparison of two doses versus control. Accordingly, for inferences for primary and secondary endpoints, the matched parallel gatekeeping procedure based on the Simes test for testing intersection hypotheses is extended here to allow the average method for such testing. This procedure is further extended to clinical trials with more than two endpoints as well as to clinical trials with more than two active doses and a control.
Asunto(s)
Ensayos Clínicos Fase III como Asunto/estadística & datos numéricos , Ensayos Clínicos Controlados Aleatorios como Asunto/estadística & datos numéricos , Proyectos de Investigación/estadística & datos numéricos , Antidepresivos/uso terapéutico , Simulación por Computador , Interpretación Estadística de Datos , Trastorno Depresivo Mayor/diagnóstico , Trastorno Depresivo Mayor/tratamiento farmacológico , Trastorno Depresivo Mayor/psicología , Relación Dosis-Respuesta a Droga , Determinación de Punto Final/estadística & datos numéricos , Humanos , Modelos Estadísticos , Quinolonas/administración & dosificación , Tiofenos/administración & dosificación , Resultado del TratamientoRESUMEN
Nonparametric multiple comparisons are a powerful statistical inference tool in psychological studies. In this paper, we review a rank-based nonparametric multiple contrast test procedure (MCTP) and propose an improvement by allowing the procedure to accommodate various effect sizes. In the review, we describe relative effects and show how utilizing the unweighted reference distribution in defining the relative effects in multiple samples may avoid the nontransitive paradoxes. Next, to improve the procedure, we allow the relative effects to be transformed by using the multivariate delta method and suggest a log odds-type transformation, which leads to effect sizes similar to Cohen's d for easier interpretation. Then, we provide theoretical justifications for an asymptotic strong control of the family-wise error rate (FWER) of the proposed method. Finally, we illustrate its use with a simulation study and an example from a neuropsychological study. The proposed method is implemented in the 'nparcomp' R package via the 'mctp' function.
Asunto(s)
Biometría , Interpretación Estadística de DatosRESUMEN
It is known that the one-sided Simes' test controls the error rate if the underlying distribution is multivariate totally positive of order 2 (MTP2), but not in general. The two-sided test also controls the error rate when the coordinate absolute value has an MTP2 distribution, which holds more generally. We prove mathematically that when the coordinate absolute value controls the error rate at level 2α, then certain kinds of truncated Simes' tests also control the one-sided error rate at level α. We also compare the closure of the truncated tests with the Holms, Hochberg, and Hommel procedures in many scenarios when the test statistics are multivariate normal.
Asunto(s)
Interpretación Estadística de Datos , Análisis Multivariante , Distribuciones Estadísticas , Biometría , Intervalos de Confianza , Humanos , Error Científico Experimental/estadística & datos numéricosRESUMEN
Response-adaptive designs allow the randomization probabilities to change during the course of a trial based on cumulated response data so that a greater proportion of patients can be allocated to the better performing treatments. A major concern over the use of response-adaptive designs in practice, particularly from a regulatory viewpoint, is controlling the type I error rate. In particular, we show that the naïve z-test can have an inflated type I error rate even after applying a Bonferroni correction. Simulation studies have often been used to demonstrate error control but do not provide a guarantee. In this article, we present adaptive testing procedures for normally distributed outcomes that ensure strong familywise error control by iteratively applying the conditional invariance principle. Our approach can be used for fully sequential and block randomized trials and for a large class of adaptive randomization rules found in the literature. We show there is a high price to pay in terms of power to guarantee familywise error control for randomization schemes with extreme allocation probabilities. However, for proposed Bayesian adaptive randomization schemes in the literature, our adaptive tests maintain or increase the power of the trial compared to the z-test. We illustrate our method using a three-armed trial in primary hypercholesterolemia.