Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 155
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Trends Biochem Sci ; 48(6): 503-512, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36842858

RESUMO

Over recent years many statisticians and researchers have highlighted that statistical inference would benefit from a better use and understanding of hypothesis testing, p-values, and statistical significance. We highlight three recommendations in the context of biochemical sciences. First recommendation: to improve the biological interpretation of biochemical data, do not use p-values (or similar test statistics) as thresholded values to select biomolecules. Second recommendation: to improve comparison among studies and to achieve robust knowledge, perform complete reporting of data. Third recommendation: statistical analyses should be reported completely with exact numbers (not as asterisks or inequalities). Owing to the high number of variables, a better use of statistics is of special importance in omic studies.

2.
Stat Med ; 43(8): 1577-1603, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38339872

RESUMO

Due to the dependency structure in the sampling process, adaptive trial designs create challenges in point and interval estimation and in the calculation of P-values. Optimal adaptive designs, which are designs where the parameters governing the adaptivity are chosen to maximize some performance criterion, suffer from the same problem. Various analysis methods which are able to handle this dependency structure have already been developed. In this work, we aim to give a comprehensive summary of these methods and show how they can be applied to the class of designs with planned adaptivity, of which optimal adaptive designs are an important member. The defining feature of these kinds of designs is that the adaptive elements are completely prespecified. This allows for explicit descriptions of the calculations involved, which makes it possible to evaluate different methods in a fast and accurate manner. We will explain how to do so, and present an extensive comparison of the performance characteristics of various estimators between an optimal adaptive design and its group-sequential counterpart.


Assuntos
Projetos de Pesquisa , Humanos , Intervalos de Confiança , Tamanho da Amostra
3.
Multivariate Behav Res ; 59(4): 738-757, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38587864

RESUMO

Calculating confidence intervals and p-values of edges in networks is useful to decide their presence or absence and it is a natural way to quantify uncertainty. Since lasso estimation is often used to obtain edges in a network, and the underlying distribution of lasso estimates is discontinuous and has probability one at zero when the estimate is zero, obtaining p-values and confidence intervals is problematic. It is also not always desirable to use the lasso to select the edges because there are assumptions required for correct identification of network edges that may not be warranted for the data at hand. Here, we review three methods that either use a modified lasso estimate (desparsified or debiased lasso) or a method that uses the lasso for selection and then determines p-values without the lasso. We compare these three methods with popular methods to estimate Gaussian Graphical Models in simulations and conclude that the desparsified lasso and its bootstrapped version appear to be the best choices for selection and quantifying uncertainty with confidence intervals and p-values.


Assuntos
Simulação por Computador , Modelos Estatísticos , Humanos , Simulação por Computador/estatística & dados numéricos , Interpretação Estatística de Dados , Incerteza , Intervalos de Confiança
4.
Biom J ; 66(1): e2300177, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38102999

RESUMO

Online testing procedures assume that hypotheses are observed in sequence, and allow the significance thresholds for upcoming tests to depend on the test statistics observed so far. Some of the most popular online methods include alpha investing, LORD++, and SAFFRON. These three methods have been shown to provide online control of the "modified" false discovery rate (mFDR) under a condition known as CS. However, to our knowledge, LORD++ and SAFFRON have only been shown to control the traditional false discovery rate (FDR) under an independence condition on the test statistics. Our work bolsters these results by showing that SAFFRON and LORD++ additionally ensure online control of the FDR under a "local" form of nonnegative dependence. Further, FDR control is maintained under certain types of adaptive stopping rules, such as stopping after a certain number of rejections have been observed. Because alpha investing can be recovered as a special case of the SAFFRON framework, our results immediately apply to alpha investing as well. In the process of deriving these results, we also formally characterize how the conditional super-uniformity assumption implicitly limits the allowed p-value dependencies. This implicit limitation is important not only to our proposed FDR result, but also to many existing mFDR results.


Assuntos
Crocus , Projetos de Pesquisa , Reações Falso-Positivas
5.
Biom J ; 66(6): e202300198, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39162085

RESUMO

Lesion-symptom mapping studies provide insight into what areas of the brain are involved in different aspects of cognition. This is commonly done via behavioral testing in patients with a naturally occurring brain injury or lesions (e.g., strokes or brain tumors). This results in high-dimensional observational data where lesion status (present/absent) is nonuniformly distributed, with some voxels having lesions in very few (or no) subjects. In this situation, mass univariate hypothesis tests have severe power heterogeneity where many tests are known a priori to have little to no power. Recent advancements in multiple testing methodologies allow researchers to weigh hypotheses according to side information (e.g., information on power heterogeneity). In this paper, we propose the use of p-value weighting for voxel-based lesion-symptom mapping studies. The weights are created using the distribution of lesion status and spatial information to estimate different non-null prior probabilities for each hypothesis test through some common approaches. We provide a monotone minimum weight criterion, which requires minimum a priori power information. Our methods are demonstrated on dependent simulated data and an aphasia study investigating which regions of the brain are associated with the severity of language impairment among stroke survivors. The results demonstrate that the proposed methods have robust error control and can increase power. Further, we showcase how weights can be used to identify regions that are inconclusive due to lack of power.


Assuntos
Biometria , Humanos , Biometria/métodos , Afasia/fisiopatologia , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico/métodos , Reações Falso-Positivas
6.
Entropy (Basel) ; 26(2)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38392373

RESUMO

The Non-Informative Nuisance Parameter Principle concerns the problem of how inferences about a parameter of interest should be made in the presence of nuisance parameters. The principle is examined in the context of the hypothesis testing problem. We prove that the mixed test obeys the principle for discrete sample spaces. We also show how adherence of the mixed test to the principle can make performance of the test much easier. These findings are illustrated with new solutions to well-known problems of testing hypotheses for count data.

7.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34015820

RESUMO

Large datasets of hundreds to thousands of individuals measuring RNA-seq in observational studies are becoming available. Many popular software packages for analysis of RNA-seq data were constructed to study differences in expression signatures in an experimental design with well-defined conditions (exposures). In contrast, observational studies may have varying levels of confounding transcript-exposure associations; further, exposure measures may vary from discrete (exposed, yes/no) to continuous (levels of exposure), with non-normal distributions of exposure. We compare popular software for gene expression-DESeq2, edgeR and limma-as well as linear regression-based analyses for studying the association of continuous exposures with RNA-seq. We developed a computation pipeline that includes transformation, filtering and generation of empirical null distribution of association P-values, and we apply the pipeline to compute empirical P-values with multiple testing correction. We employ a resampling approach that allows for assessment of false positive detection across methods, power comparison and the computation of quantile empirical P-values. The results suggest that linear regression methods are substantially faster with better control of false detections than other methods, even with the resampling method to compute empirical P-values. We provide the proposed pipeline with fast algorithms in an R package Olivia, and implemented it to study the associations of measures of sleep disordered breathing with RNA-seq in peripheral blood mononuclear cells in participants from the Multi-Ethnic Study of Atherosclerosis.


Assuntos
Benchmarking/métodos , RNA-Seq , Análise de Sequência de RNA , Software , Algoritmos , Aterosclerose/epidemiologia , Aterosclerose/etiologia , Aterosclerose/metabolismo , Simulação por Computador , Suscetibilidade a Doenças , Predisposição Genética para Doença , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação , Fenótipo , Medição de Risco , Fatores de Risco , Navegador
8.
Hum Reprod ; : 548-558, 2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-38015794

RESUMO

STUDY QUESTION: What were the frequency and temporal trends of reporting P-values and effect measures in the abstracts of reproductive medicine studies in 1990-2022, how were reported P-values distributed, and what proportion of articles that present with statistical inference reported statistically significant results, i.e. 'positive' results? SUMMARY ANSWER: Around one in six abstracts reported P-values alone without effect measures, while the prevalence of effect measures, whether reported alone or accompanied by P-values, has been increasing, especially in meta-analyses and randomized controlled trials (RCTs); the reported P-values were frequently observed around certain cut-off values, notably at 0.001, 0.01, or 0.05, and among abstracts present with statistical inference (i.e. P-value, CIs, or significant terms), a large majority (77%) reported at least one statistically significant finding. WHAT IS KNOWN ALREADY: Publishing or reporting only results that show a 'positive' finding causes bias in evaluating interventions and risk factors and may incur adverse health outcomes for patients.Despite efforts to minimize publication reporting bias in medical research, it remains unclear whether the magnitude and patterns of the bias have changed over time. STUDY DESIGN, SIZE, DURATION: We studied abstracts of reproductive medicine studies from 1990 to 2022. The reproductive medicine studies were published in 23 first-quartile journals under the category of Obstetrics and Gynaecology and Reproductive Biology in Journal Citation Reports and 5 high-impact general medical journals (The Journal of the American Medical Association, The Lancet, The BMJ, The New England Journal of Medicine, and PLoS Medicine). Articles without abstracts, animal studies, and non-research articles, such as case reports or guidelines, were excluded. PARTICIPANTS/MATERIALS, SETTING, METHODS: Automated text-mining was used to extract three types of statistical significance reporting, including P-values, CIs, and text description. Meanwhile, abstracts were text-mined for the presence of effect size metrics and Bayes factors. Five hundred abstracts were randomly selected and manually checked for the accuracy of automatic text extraction. The extracted statistical significance information was then analysed for temporal trends and distribution in general as well as in subgroups of study designs and journals. MAIN RESULTS AND THE ROLE OF CHANCE: A total of 24 907 eligible reproductive medicine articles were identified from 170 739 screened articles published in 28 journals. The proportion of abstracts not reporting any statistical significance inference halved from 81% (95% CI, 76-84%) in 1990 to 40% (95% CI, 38-44%) in 2021, while reporting P-values alone remained relatively stable, at 15% (95% CI, 12-18%) in 1990 and 19% (95% CI, 16-22%) in 2021. By contrast, the proportion of abstracts reporting effect measures alone increased considerably from 4.1% (95% CI, 2.6-6.3%) in 1990 to 26% (95% CI, 23-29%) in 2021. Similarly, the proportion of abstracts reporting effect measures together with P-values showed substantial growth from 0.8% (95% CI, 0.3-2.2%) to 14% (95% CI, 12-17%) during the same timeframe. Of 30 182 statistical significance inferences, 56% (n = 17 077) conveyed statistical inferences via P-values alone, 30% (n = 8945) via text description alone such as significant or non-significant, 9.3% (n = 2820) via CIs alone, and 4.7% (n = 1340) via both CI and P-values. The reported P-values (n = 18 417), including both a continuum of P-values and dichotomized P-values, were frequently observed around common cut-off values such as 0.001 (20%), 0.05 (16%), and 0.01 (10%). Of the 13 200 reproductive medicine abstracts containing at least one statistical inference, 77% of abstracts made at least one statistically significant statement. Among articles that reported statistical inference, a decline in the proportion of making at least one statistically significant inference was only seen in RCTs, dropping from 71% (95% CI, 48-88%) in 1990 to 59% (95% CI, 42-73%) in 2021, whereas the proportion in the rest of study types remained almost constant over the years. Of abstracts that reported P-value, 87% (95% CI, 86-88%) reported at least one statistically significant P-value; it was 92% (95% CI, 82-97%) in 1990 and reached its peak at 97% (95% CI, 93-99%) in 2001 before declining to 81% (95% CI, 76-85%) in 2021. LIMITATIONS, REASONS FOR CAUTION: First, our analysis focused solely on reporting patterns in abstracts but not full-text papers; however, in principle, abstracts should include condensed impartial information and avoid selective reporting. Second, while we attempted to identify all types of statistical significance reporting, our text mining was not flawless. However, the manual assessment showed that inaccuracies were not frequent. WIDER IMPLICATIONS OF THE FINDINGS: There is a welcome trend that effect measures are increasingly reported in the abstracts of reproductive medicine studies, specifically in RCTs and meta-analyses. Publication reporting bias remains a major concern. Inflated estimates of interventions and risk factors could harm decisions built upon biased evidence, including clinical recommendations and planning of future research. STUDY FUNDING/COMPETING INTEREST(S): No funding was received for this study. B.W.M. is supported by an NHMRC Investigator grant (GNT1176437); B.W.M. reports research grants and travel support from Merck and consultancy from Merch and ObsEva. W.L. is supported by an NHMRC Investigator Grant (GNT2016729). Q.F. reports receiving a PhD scholarship from Merck. The other author has no conflict of interest to declare. TRIAL REGISTRATION NUMBER: N/A.

9.
Proc Natl Acad Sci U S A ; 117(32): 19151-19158, 2020 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-32703808

RESUMO

In randomized experiments, Fisher-exact P values are available and should be used to help evaluate results rather than the more commonly reported asymptotic P values. One reason is that using the latter can effectively alter the question being addressed by including irrelevant distributional assumptions. The Fisherian statistical framework, proposed in 1925, calculates a P value in a randomized experiment by using the actual randomization procedure that led to the observed data. Here, we illustrate this Fisherian framework in a crossover randomized experiment. First, we consider the first period of the experiment and analyze its data as a completely randomized experiment, ignoring the second period; then, we consider both periods. For each analysis, we focus on 10 outcomes that illustrate important differences between the asymptotic and Fisher tests for the null hypothesis of no ozone effect. For some outcomes, the traditional P value based on the approximating asymptotic Student's t distribution substantially subceeded the minimum attainable Fisher-exact P value. For the other outcomes, the Fisher-exact null randomization distribution substantially differed from the bell-shaped one assumed by the asymptotic t test. Our conclusions: When researchers choose to report P values in randomized experiments, 1) Fisher-exact P values should be used, especially in studies with small sample sizes, and 2) the shape of the actual null randomization distribution should be examined for the recondite scientific insights it may reveal.


Assuntos
Ensaios Clínicos Controlados Aleatórios como Assunto/normas , Estudos Cross-Over , Interpretação Estatística de Dados , Humanos , Modelos Estatísticos , Distribuição Aleatória , Pesquisadores , Tamanho da Amostra
10.
Biom J ; 65(8): e2200300, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37789586

RESUMO

We give a simulation-based method for computing the multiplicity adjusted p-values and critical constants for the Dunnett procedure for comparing treatments with a control under heteroskedasticity. The Welch-Satterthwaite test statistics used in this procedure do not have a simple multivariate t-distribution because their denominators are mixtures of chi-squares and are correlated because of the common control treatment sample variance present in all denominators. The joint distribution of the denominators of the test statistics is approximated by correlated chi-square variables and is generated using a novel algorithm proposed in this paper. This approximation is used to derive critical constants or adjusted p-values. The familywise error rate (FWER) of the proposed method is compared with some existing methods via simulation under different heteroskedastic scenarios. The results show that our proposed method controls the FWER most accurately, whereas other methods are either too conservative or liberal or control the FWER less accurately. The different methods considered are illustrated on a real data set.


Assuntos
Algoritmos , Modelos Estatísticos , Simulação por Computador
11.
Brief Bioinform ; 21(4): 1487-1494, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31298267

RESUMO

This note complements and clarifies part of the work of Hawinkel et al. recently published in the journal and suggests some more or less standard tools and methods for carrying out association studies of the microbiome.


Assuntos
Microbiota , Modelos Estatísticos , Publicações
12.
Biometrics ; 78(2): 789-797, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-33559878

RESUMO

In dose-response analysis, it is a challenge to choose appropriate linear or curvilinear shapes when considering multiple, differently scaled endpoints. It has been proposed to fit several marginal regression models that try sets of different transformations of the dose levels as explanatory variables for each endpoint. However, the multiple testing problem underlying this approach, involving correlated parameter estimates for the dose effect between and within endpoints, could only be adjusted heuristically. An asymptotic correction for multiple testing can be derived from the score functions of the marginal regression models. Based on a multivariate t-distribution, the correction provides a one-step adjustment of p-values that accounts for the correlation between estimates from different marginal models. The advantages of the proposed methodology are demonstrated through three example datasets, involving generalized linear models with differently scaled endpoints, differing covariates, and a mixed effect model and through simulation results. The methodology is implemented in an R package.


Assuntos
Modelos Estatísticos , Simulação por Computador , Modelos Lineares , Análise Multivariada
13.
Proc Natl Acad Sci U S A ; 116(4): 1195-1200, 2019 01 22.
Artigo em Inglês | MEDLINE | ID: mdl-30610179

RESUMO

Analysis of "big data" frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human-pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini-Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.


Assuntos
Hepacivirus/genética , Hepatite C/virologia , Carga Viral/genética , Estudo de Associação Genômica Ampla/métodos , Humanos , Modelos Estatísticos
14.
Synthese ; 200(3): 220, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35578622

RESUMO

While the common procedure of statistical significance testing and its accompanying concept of p-values have long been surrounded by controversy, renewed concern has been triggered by the replication crisis in science. Many blame statistical significance tests themselves, and some regard them as sufficiently damaging to scientific practice as to warrant being abandoned. We take a contrary position, arguing that the central criticisms arise from misunderstanding and misusing the statistical tools, and that in fact the purported remedies themselves risk damaging science. We argue that banning the use of p-value thresholds in interpreting data does not diminish but rather exacerbates data-dredging and biasing selection effects. If an account cannot specify outcomes that will not be allowed to count as evidence for a claim-if all thresholds are abandoned-then there is no test of that claim. The contributions of this paper are: To explain the rival statistical philosophies underlying the ongoing controversy; To elucidate and reinterpret statistical significance tests, and explain how this reinterpretation ameliorates common misuses and misinterpretations; To argue why recent recommendations to replace, abandon, or retire statistical significance undermine a central function of statistics in science: to test whether observed patterns in the data are genuine or due to background variability.

15.
Genet Epidemiol ; 44(4): 339-351, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32100375

RESUMO

Testing millions of single nucleotide polymorphisms (SNPs) in genetic association studies has become a standard routine for disease gene discovery. In light of recent re-evaluation of statistical practice, it has been suggested that p-values are unfit as summaries of statistical evidence. Despite this criticism, p-values contain information that can be utilized to address the concerns about their flaws. We present a new method for utilizing evidence summarized by p-values for estimating odds ratio (OR) based on its approximate posterior distribution. In our method, only p-values, sample size, and standard deviation for ln(OR) are needed as summaries of data, accompanied by a suitable prior distribution for ln(OR) that can assume any shape. The parameter of interest, ln(OR), is the only parameter with a specified prior distribution, hence our model is a mix of classical and Bayesian approaches. We show that our method retains the main advantages of the Bayesian approach: it yields direct probability statements about hypotheses for OR and is resistant to biases caused by selection of top-scoring SNPs. Our method enjoys greater flexibility than similarly inspired methods in the assumed distribution for the summary statistic and in the form of the prior for the parameter of interest. We illustrate our method by presenting interval estimates of effect size for reported genetic associations with lung cancer. Although we focus on OR, the method is not limited to this particular measure of effect size and can be used broadly for assessing reliability of findings in studies testing multiple predictors.


Assuntos
Suscetibilidade a Doenças , Modelos Genéticos , Teorema de Bayes , Loci Gênicos , Humanos , Polimorfismo de Nucleotídeo Único
16.
Genet Epidemiol ; 44(8): 841-853, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32779262

RESUMO

Many variants with low frequencies or with low to modest effects likely remain unidentified in genome-wide association studies (GWAS) because of stringent genome-wide thresholds for detection. To improve the power of detection, variant prioritization based on their functional annotations and epigenetic landmarks has been used successfully. Here, we propose a novel method of prioritization of a GWAS by exploiting gene-level knowledge (e.g., annotations to pathways and ontologies) and show that it further improves power. Often, disease associated variants are found near genes that are coinvolved in specific biological pathways relevant to disease process. Utilization of this knowledge to conduct a prioritized scan increases the power to detect loci that map to genes clustered in a few specific pathways. We have developed a computationally scalable framework based on penalized logistic regression (termed GKnowMTest-Genomic Knowledge-guided Multiplte Testing) to enable a prioritized pathway-guided GWAS scan with a very large number of gene-level annotations. We demonstrate that the proposed strategy improves overall power and maintains the Type 1 error globally. Our method works on genome-wide summary level data and a user-specified list of pathways (e.g., those extracted from large pathway databases without reference to biology of a specific disease). It automatically reweights the input p values by incorporating the pathway enrichments as "adaptively learned" from the data using a cross-validation technique to avoid overfitting. We used whole-genome simulations and some publicly available GWAS data sets to illustrate the application of our method. The GKnowMTest framework has been implemented as a user-friendly open-source R package.


Assuntos
Estudo de Associação Genômica Ampla , Simulação por Computador , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Genoma Humano , Humanos , Modelos Genéticos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética
17.
Genet Epidemiol ; 44(4): 330-338, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32043633

RESUMO

Gene-set analyses are used to assess whether there is any evidence of association with disease among a set of biologically related genes. Such an analysis typically treats all genes within the sets similarly, even though there is substantial, external, information concerning the likely importance of each gene within each set. For example, for traits that are under purifying selection, we would expect genes showing extensive genic constraint to be more likely to be trait associated than unconstrained genes. Here we improve gene-set analyses by incorporating such external information into a higher-criticism-based signal detection analysis. We show that when this external information is predictive of whether a gene is associated with disease, our approach can lead to a significant increase in power. Further, our approach is particularly powerful when the signal is sparse, that is when only a small number of genes within the set are associated with the trait. We illustrate our approach with a gene-set analysis of amyotrophic lateral sclerosis (ALS) and implicate a number of gene-sets containing SOD1 and NEK1 as well as showing enrichment of small p values for gene-sets containing known ALS genes. We implement our approach in the R package wHC.


Assuntos
Esclerose Lateral Amiotrófica/genética , Esclerose Lateral Amiotrófica/patologia , Exoma/genética , Predisposição Genética para Doença , Variação Genética , Humanos , Quinase 1 Relacionada a NIMA/genética , Superóxido Dismutase-1/genética , Interface Usuário-Computador
18.
Artigo em Inglês | MEDLINE | ID: mdl-33156000

RESUMO

Combining correlated p-values from multiple hypothesis testing is a most frequently used method for integrating information in genetic and genomic data analysis. However, most existing methods for combining independent p-values from individual component problems into a single unified p-value are unsuitable for the correlational structure among p-values from multiple hypothesis testing. Although some existing p-value combination methods had been modified to overcome the potential limitations, there is no uniformly most powerful method for combining correlated p-values in genetic data analysis. Therefore, providing a p-value combination method that can robustly control type I errors and keep the good power rates is necessary. In this paper, we propose an empirical method based on the gamma distribution (EMGD) for combining dependent p-values from multiple hypothesis testing. The proposed test, EMGD, allows for flexible accommodating the highly correlated p-values from the multiple hypothesis testing into a unified p-value for examining the combined hypothesis that we are interested in. The EMGD retains the robustness character of the empirical Brown's method (EBM) for pooling the dependent p-values from multiple hypothesis testing. Moreover, the EMGD keeps the character of the method based on the gamma distribution that simultaneously retains the advantages of the z-transform test and the gamma-transform test for combining dependent p-values from multiple statistical tests. The two characters lead to the EMGD that can keep the robust power for combining dependent p-values from multiple hypothesis testing. The performance of the proposed method EMGD is illustrated with simulations and real data applications by comparing with the existing methods, such as Kost and McDermott's method, the EBM and the harmonic mean p-value method.

19.
Paediatr Perinat Epidemiol ; 35(1): 8-23, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33269490

RESUMO

The "replication crisis" has been attributed to perverse incentives that lead to selective reporting and misinterpretations of P-values and confidence intervals. A crude fix offered for this problem is to lower testing cut-offs (α levels), either directly or in the form of null-biased multiple comparisons procedures such as naïve Bonferroni adjustments. Methodologists and statisticians have expressed positions that range from condemning all such procedures to demanding their application in almost all analyses. Navigating between these unjustifiable extremes requires defining analysis goals precisely enough to separate inappropriate from appropriate adjustments. To meet this need, I here review issues arising in single-parameter inference (such as error costs and loss functions) that are often skipped in basic statistics, yet are crucial to understanding controversies in testing and multiple comparisons. I also review considerations that should be made when examining arguments for and against modifications of decision cut-offs and adjustments for multiple comparisons. The goal is to provide researchers a better understanding of what is assumed by each side and to enable recognition of hidden assumptions. Basic issues of goal specification and error costs are illustrated with simple fixed cut-off hypothesis testing scenarios. These illustrations show how adjustment choices are extremely sensitive to implicit decision costs, making it inevitable that different stakeholders will vehemently disagree about what is necessary or appropriate. Because decisions cannot be justified without explicit costs, resolution of inference controversies is impossible without recognising this sensitivity. Pre-analysis statements of funding, scientific goals, and analysis plans can help counter demands for inappropriate adjustments, and can provide guidance as to what adjustments are advisable. Hierarchical (multilevel) regression methods (including Bayesian, semi-Bayes, and empirical-Bayes methods) provide preferable alternatives to conventional adjustments, insofar as they facilitate use of background information in the analysis model, and thus can provide better-informed estimates on which to base inferences and decisions.


Assuntos
Objetivos , Projetos de Pesquisa , Teorema de Bayes , Humanos
20.
J Card Surg ; 36(11): 4322-4331, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34477260

RESUMO

Null hypothesis significance testing (NHST) and p-values are widespread in the cardiac surgical literature but are frequently misunderstood and misused. The purpose of the review is to discuss major disadvantages of p-values and suggest alternatives. We describe diagnostic tests, the prosecutor's fallacy in the courtroom, and NHST, which involve inter-related conditional probabilities, to help clarify the meaning of p-values, and discuss the enormous sampling variability, or unreliability, of p-values. Finally, we use a cardiac surgical database and simulations to explore further issues involving p-values. In clinical studies, p-values provide a poor summary of the observed treatment effect, whereas the three-number summary provided by effect estimates and confidence intervals is more informative and minimizes over-interpretation of a "significant" result. p-values are an unreliable measure of the strength of evidence; if used at all they give only, at best, a very rough guide to decision making. Researchers should adopt Open Science practices to improve the trustworthiness of research and, where possible, use estimation (three-number summaries) or other better techniques.


Assuntos
Projetos de Pesquisa , Teorema de Bayes , Humanos , Probabilidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA