Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Biom J ; 66(2): e2200204, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38356198

ABSTRACT

Storey's estimator for the proportion of true null hypotheses, originally proposed under the continuous framework, has been modified in this work under the discrete framework. The modification results in improved estimation of the parameter of interest. The proposed estimator is used to formulate an adaptive version of the Benjamini-Hochberg procedure. Control over the false discovery rate by the proposed adaptive procedure has been proved analytically. The proposed estimate is also used to formulate an adaptive version of the Benjamini-Hochberg-Heyse procedure. Simulation experiments establish the conservative nature of this new adaptive procedure. Substantial amount of gain in power is observed for the new adaptive procedures over the standard procedures. For demonstration of the proposed method, two important real life gene expression data sets, one related to the study of HIV and the other related to methylation study, are used.


Subject(s)
Computer Simulation
2.
J Appl Stat ; 49(14): 3591-3613, 2022.
Article in English | MEDLINE | ID: mdl-36246854

ABSTRACT

Two recently introduced model-based bias-corrected estimators for proportion of true null hypotheses ( π 0 ) under multiple hypotheses testing scenario have been restructured for random observations under a suitable failure model, available for each of the common hypotheses. Based on stochastic ordering, a new motivation behind formulation of some related estimators for π 0 is given. The reduction of bias for the model-based estimators are theoretically justified and algorithms for computing the estimators are also presented. The estimators are also used to formulate a popular adaptive multiple testing procedure. Extensive numerical study supports superiority of the bias-corrected estimators. The necessity of the proper distributional assumption for the failure data in the context of the model-based bias-corrected method has been highlighted. A case-study is done with a real-life dataset in connection with reliability and warranty studies to demonstrate the applicability of the procedure, under a non-Gaussian setup. The results obtained are in line with the intuition and experience of the subject expert. An intriguing discussion has been attempted to conclude the article that also indicates the future scope of study.

3.
J Am Stat Assoc ; 117(538): 823-834, 2022.
Article in English | MEDLINE | ID: mdl-35845434

ABSTRACT

We consider in this paper detection of signal regions associated with disease outcomes in whole genome association studies. Gene- or region-based methods have become increasingly popular in whole genome association analysis as a complementary approach to traditional individual variant analysis. However, these methods test for the association between an outcome and the genetic variants in a pre-specified region, e.g., a gene. In view of massive intergenic regions in whole genome sequencing (WGS) studies, we propose a computationally efficient quadratic scan (Q-SCAN) statistic based method to detect the existence and the locations of signal regions by scanning the genome continuously. The proposed method accounts for the correlation (linkage disequilibrium) among genetic variants, and allows for signal regions to have both causal and neutral variants, and the effects of signal variants to be in different directions. We study the asymptotic properties of the proposed Q-SCAN statistics. We derive an empirical threshold that controls for the family-wise error rate, and show that under regularity conditions the proposed method consistently selects the true signal regions. We perform simulation studies to evaluate the finite sample performance of the proposed method. Our simulation results show that the proposed procedure outperforms the existing methods, especially when signal regions have causal variants whose effects are in different directions, or are contaminated with neutral variants. We illustrate Q-SCAN by analyzing the WGS data from the Atherosclerosis Risk in Communities study.

4.
Financ Res Lett ; 44: 102049, 2022 Jan.
Article in English | MEDLINE | ID: mdl-35475023

ABSTRACT

The COVID-19 global pandemic has disrupted business-as-usual, hence, affecting sustained economic development across countries. However, it appears economic uncertainty following COVID-19 containment measures favor market signals of cryptocurrencies. Here, this study empirically and structurally investigates the implication of COVID-19 health outcomes on market prices of Bitcoin, Bitcoin Cash, Ethereum, and Litecoin. Evidence from the novel Romano-Wolf multiple hypotheses reveal COVID-19 shocks spur Litecoin by 3.20-3.84%, Bitcoin by 2.71-3.27%, Ethereum by 1.43-1.75%, and Bitcoin Cash by 1.34-1.62%.

5.
Biom J ; 64(2): 361-376, 2022 02.
Article in English | MEDLINE | ID: mdl-33837570

ABSTRACT

In a paper published in 1939 in The Annals of Mathematical Statistics, Wald and Wolfowitz discussed the possible validity of a probability inequality between one- and two-sided coverage probabilities for the empirical distribution function. Twenty-eight years later, Vandewiele and Noé proved this inequality for Kolmogorov-Smirnov type goodness of fit tests. We refer to this type of inequality as one-two inequality. In this paper, we generalize their result for one- and two-sided union-intersection tests based on positively associated random variables and processes. Thereby, we give a brief review of different notions of positive association and corresponding results. Moreover, we introduce the notion of one-two dependence and discuss relationships with other dependence concepts. While positive association implies one-two dependence, the reverse implication fails. Last but not least, the Bonferroni inequality and the one-two inequality yield lower and upper bounds for two-sided acceptance/rejection probabilities which differ only slightly for significance levels not too large. We discuss several examples where the one-two inequality applies. Finally, we briefly discuss the possible impact of the validity of a one-two inequality on directional error control in multiple testing.


Subject(s)
Probability
6.
Article in English | MEDLINE | ID: mdl-34501892

ABSTRACT

Multiplicity arises when data analysis involves multiple simultaneous inferences, increasing the chance of spurious findings. It is a widespread problem frequently ignored by researchers. In this paper, we perform an exploratory analysis of the Web of Science database for COVID-19 observational studies. We examined 100 top-cited COVID-19 peer-reviewed articles based on p-values, including up to 7100 simultaneous tests, with 50% including >34 tests, and 20% > 100 tests. We found that the larger the number of tests performed, the larger the number of significant results (r = 0.87, p < 10-6). The number of p-values in the abstracts was not related to the number of p-values in the papers. However, the highly significant results (p < 0.001) in the abstracts were strongly correlated (r = 0.61, p < 10-6) with the number of p < 0.001 significances in the papers. Furthermore, the abstracts included a higher proportion of significant results (0.91 vs. 0.50), and 80% reported only significant results. Only one reviewed paper addressed multiplicity-induced type I error inflation, pointing to potentially spurious results bypassing the peer-review process. We conclude the need to pay special attention to the increased chance of false discoveries in observational studies, including non-replicated striking discoveries with a potentially large social impact. We propose some easy-to-implement measures to assess and limit the effects of multiplicity.


Subject(s)
COVID-19 , Humans , Peer Review , Probability , SARS-CoV-2
7.
Environ Res ; 201: 111600, 2021 10.
Article in English | MEDLINE | ID: mdl-34214558

ABSTRACT

We analyse the paper "The spread of SARS-CoV-2 in Spain: Hygiene habits, sociodemographic profile, mobility patterns and comorbidities" authored by Rodríguez-Barranco et al. (2021), published in Environmental Research, vol.192, January 2021. The study was carried out under challenging conditions and provides original data of great value for exploratory purposes. Nevertheless, we found that the authors have not considered the potential effect of the multiple hypothesis testing carried out until obtaining the final model on the increased occurrence of false discoveries by mere chance. After adjusting the results provided in the paper for the effects of multiple testing, we conclude that only one of the five factors cited as statistically significant and relevant in the article, living with someone who has suffered from COVID-19, remained significantly related to the relative prevalence of COVID-19. Therefore, the preeminent role given in the analysed work to walking the dog as one of the main transmission routes of COVID-19 probably does not correspond to an actual effect. Instead, until replicated by other studies, it should be considered a spurious discovery.


Subject(s)
COVID-19 , Animals , Dogs , Humans , SARS-CoV-2 , Spain , Walking
8.
J Diabetes Sci Technol ; 15(1): 141-146, 2021 01.
Article in English | MEDLINE | ID: mdl-31640408

ABSTRACT

INTRODUCTION: It is important to have accurate information regarding when individuals with type 1 diabetes have eaten and taken insulin to reconcile those events with their blood glucose levels throughout the day. Insulin pumps and connected insulin pens provide records of when the user injected insulin and how many carbohydrates were recorded, but it is often unclear when meals occurred. This project demonstrates a method to estimate meal times using a multiple hypothesis approach. METHODS: When an insulin dose is recorded, multiple hypotheses were generated describing variations of when the meal in question occurred. As postprandial glucose values informed the model, the posterior probability of the truth of each hypothesis was evaluated, and from these posterior probabilities, an expected meal time was found. This method was tested using simulation and a clinical data set (n = 11) and with either uniform or normally distributed (µ = 0, σ = 10 or 20 minutes) prior probabilities for the hypothesis set. RESULTS: For the simulation data set, meals were estimated with an average error of -0.77 (±7.94) minutes when uniform priors were used and -0.99 (±8.55) and -0.88 (±7.84) for normally distributed priors (σ = 10 and 20 minutes). For the clinical data set, the average estimation error was 0.02 (±30.87), 1.38 (±21.58), and 0.04 (±27.52) for the uniform priors and normal priors (σ = 10 and 20 minutes). CONCLUSION: This technique could be used to help advise physicians about the meal time insulin dosing behaviors of their patients and potentially influence changes in their treatment strategy.


Subject(s)
Diabetes Mellitus, Type 1 , Blood Glucose , Cross-Over Studies , Diabetes Mellitus, Type 1/drug therapy , Humans , Insulin , Meals , Postprandial Period
9.
R Soc Open Sci ; 7(6): 200231, 2020 Jun.
Article in English | MEDLINE | ID: mdl-32742690

ABSTRACT

Science provides a method to learn about the relationships between observed patterns and the processes that generate them. However, inference can be confounded when an observed pattern cannot be clearly and wholly attributed to a hypothesized process. Over-reliance on traditional single-hypothesis methods (i.e. null hypothesis significance testing) has resulted in replication crises in several disciplines, and ecology exhibits features common to these fields (e.g. low-power study designs, questionable research practices, etc.). Considering multiple working hypotheses in combination with pre-data collection modelling can be an effective means to mitigate many of these problems. We present a framework for explicitly modelling systems in which relevant processes are commonly omitted, overlooked or not considered and provide a formal workflow for a pre-data collection analysis of multiple candidate hypotheses. We advocate for and suggest ways that pre-data collection modelling can be combined with consideration of multiple working hypotheses to improve the efficiency and accuracy of research in ecology.

10.
Anal Chim Acta ; 1097: 49-61, 2020 Feb 08.
Article in English | MEDLINE | ID: mdl-31910969

ABSTRACT

Clinical metabolomics aims at finding statistically significant differences in metabolic statuses of patient and control groups with the intention of understanding pathobiochemical processes and identification of clinically useful biomarkers of particular diseases. After the raw measurements are integrated and pre-processed as intensities of chromatographic peaks, the differences between controls and patients are evaluated by both univariate and multivariate statistical methods. The traditional univariate approach relies on t-tests (or their nonparametric alternatives) and the results from multiple testing are misleadingly compared merely by p-values using the so-called volcano plot. This paper proposes a Bayesian counterpart to the widespread univariate analysis, taking into account the compositional character of a metabolome. Since each metabolome is a collection of some small-molecule metabolites in a biological material, the relative structure of metabolomic data, which is inherently contained in ratios between metabolites, is of the main interest. Therefore, a proper choice of logratio coordinates is an essential step for any statistical analysis of such data. In addition, a concept of b-values is introduced together with a Bayesian version of the volcano plot incorporating distance levels of the posterior highest density intervals from zero. The theoretical background of the contribution is illustrated using two data sets containing samples of patients suffering from 3-hydroxy-3-methylglutaryl-CoA lyase deficiency and medium-chain acyl-CoA dehydrogenase deficiency. To evaluate the stability of the proposed method as well as the benefits of the compositional approach, two simulations designed to mimic a loss of samples and a systematical measurement error, respectively, are added.


Subject(s)
Acetyl-CoA C-Acetyltransferase/deficiency , Acyl-CoA Dehydrogenase/deficiency , Amino Acid Metabolism, Inborn Errors/metabolism , Bayes Theorem , Lipid Metabolism, Inborn Errors/metabolism , Metabolomics , Acetyl-CoA C-Acetyltransferase/metabolism , Acyl-CoA Dehydrogenase/metabolism , Datasets as Topic , Humans
11.
Comput Stat Data Anal ; 136: 123-136, 2019 Aug.
Article in English | MEDLINE | ID: mdl-31662591

ABSTRACT

An optimal and flexible multiple hypotheses testing procedure is constructed for dependent data based on Bayesian techniques, aiming at handling two challenges, namely dependence structure and non-null distribution specification. Ignoring dependence among hypotheses tests may lead to loss of efficiency and bias in decision. Misspecification in the non-null distribution, on the other hand, can result in both false positive and false negative errors. Hidden Markov models are used to accommodate the dependence structure among the tests. Dirichlet mixture process prior is applied on the non-null distribution to overcome the potential pitfalls in distribution misspecification. The testing algorithm based on Bayesian techniques optimizes the false negative rate (FNR) while controlling the false discovery rate (FDR). The procedure is applied to pointwise and clusterwise analysis. Its performance is compared with existing approaches using both simulated and real data examples.

12.
Stat Biopharm Res ; 11(3): 210-219, 2019.
Article in English | MEDLINE | ID: mdl-31467642

ABSTRACT

Neoadjuvant (preoperative) approach to breast cancer treatment has become widely accepted. Traditionally the primary objective of neoadjuvant treatment is to improve subsequent surgical intervention and the effectiveness are often evaluated by its ability achieving complete pathological response (pCR), the eradication of the malignant disease in the breast and axillary lymph nodes. More recently, neoadjuvant treatment has also become recognized as an in vivo, preoperative 'window of opportunity' to explore the efficacy of novel agents such as immunotherapies where a wealth of tumor biomarkers are also routinely collected to quantify antitumor immunity. However, one challenge to combine the traditional pCR and efficacy biomarkers is that these tumor biomarkers are only partially available and cannot be measured in patients who have achieved pCR. In this article, a stepwise hypothesis testing procedure is proposed to combine a continuous tumor biomarker with the conventional binary endpoint in a two-arm randomized phase II superiority trial to improve statistical power. The operating characteristics of proposed procedure is illustrated with a real-world example and the performance is also evaluated numerically.

13.
Sensors (Basel) ; 19(14)2019 Jul 15.
Article in English | MEDLINE | ID: mdl-31311122

ABSTRACT

In multi-sensor fusion (MSF), the integration of multi-sensor observation data with different observation errors to achieve more accurate positioning of the target has always been a research focus. In this study, a modified ensemble Kalman filter (EnKF) is presented to substitute the traditional Kalman filter (KF) in the multiple hypotheses tracking (MHT) to deal with the high nonlinearity that always shows up in multiple target tracking (MTT) problems. In addition, the multi-source observation data fusion is also realized by using the modified EnKF, which enables the low-precision observation data to be corrected by high-precision observation data, and the accuracy of the corrected data can be calibrated by the statistical information provided by the EnKF. Numerical studies are given to demonstrate the effectiveness of our proposed method and the results show that the MHT-EnKF method can achieve remarkable enhancement in dealing with nonlinear movement variation and positioning accuracy for MTT problems in MSF scenario.

14.
Biom J ; 60(4): 761-779, 2018 07.
Article in English | MEDLINE | ID: mdl-29748972

ABSTRACT

We consider multiple testing with false discovery rate (FDR) control when p values have discrete and heterogeneous null distributions. We propose a new estimator of the proportion of true null hypotheses and demonstrate that it is less upwardly biased than Storey's estimator and two other estimators. The new estimator induces two adaptive procedures, that is, an adaptive Benjamini-Hochberg (BH) procedure and an adaptive Benjamini-Hochberg-Heyse (BHH) procedure. We prove that the adaptive BH (aBH) procedure is conservative nonasymptotically. Through simulation studies, we show that these procedures are usually more powerful than their nonadaptive counterparts and that the adaptive BHH procedure is usually more powerful than the aBH procedure and a procedure based on randomized p-value. The adaptive procedures are applied to a study of HIV vaccine efficacy, where they identify more differentially polymorphic positions than the BH procedure at the same FDR level.


Subject(s)
Biometry/methods , False Positive Reactions
15.
Stat Med ; 36(25): 4007-4027, 2017 Nov 10.
Article in English | MEDLINE | ID: mdl-28786130

ABSTRACT

With increasingly abundant spatial data in the form of case counts or rates combined over areal regions (eg, ZIP codes, census tracts, or counties), interest turns to formal identification of difference "boundaries," or barriers on the map, in addition to the estimated statistical map itself. "Boundary" refers to a border that describes vastly disparate outcomes in the adjacent areal units, perhaps caused by latent risk factors. This article focuses on developing a model-based statistical tool, equipped to identify difference boundaries in maps with a small number of areal units, also referred to as small-scale maps. This article proposes a novel and robust nonparametric boundary detection rule based on nonparametric Dirichlet processes, later referred to as Dirichlet process wombling (DPW) rule, by employing Dirichlet process-based mixture models for small-scale maps. Unlike the recently proposed nonparametric boundary detection rules based on false discovery rates, the DPW rule is free of ad hoc parameters, computationally simple, and readily implementable in freely available software for public health practitioners such as JAGS and OpenBUGS and yet provides statistically interpretable boundary detection in small-scale wombling. We offer a detailed simulation study and an application of our proposed approach to a urinary bladder cancer incidence rates dataset between 1990 and 2012 in the 8 counties in Connecticut.


Subject(s)
Bayes Theorem , Small-Area Analysis , Spatial Analysis , Statistics, Nonparametric , Computer Simulation , Connecticut/epidemiology , Humans , Maps as Topic , Probability , Risk Factors , Urinary Bladder Neoplasms/epidemiology
16.
Stat Med ; 34(25): 3362-75, 2015 Nov 10.
Article in English | MEDLINE | ID: mdl-26112310

ABSTRACT

Many gene expression data are based on two experiments where the gene expressions of the targeted genes under both experiments are correlated. We consider problems in which objectives are to find genes that are simultaneously upregulated/downregulated under both experiments. A Bayesian methodology is proposed based on directional multiple hypotheses testing. We propose a false discovery rate specific to the problem under consideration, and construct a Bayes rule satisfying a false discovery rate criterion. The proposed method is compared with a traditional rule through simulation studies. We apply our methodology to two real examples involving microRNAs; where in one example the targeted genes are simultaneously downregulated under both experiments, and in the other the targeted genes are downregulated in one experiment and upregulated in the other experiment. We also discuss how the proposed methodology can be extended to more than two experiments.


Subject(s)
Bayes Theorem , Gene Expression Profiling/methods , Gene Expression Regulation , MicroRNAs/genetics , Models, Statistical , Algorithms , Computer Simulation , Databases, Genetic , Down-Regulation/genetics , Humans , Up-Regulation/genetics
17.
Biom J ; 56(6): 1035-54, 2014 Nov.
Article in English | MEDLINE | ID: mdl-25231605

ABSTRACT

We consider the problem treated by Simes of testing the overall null hypothesis formed by the intersection of a set of elementary null hypotheses based on ordered p-values of the associated test statistics. The Simes test uses critical constants that do not need tabulation. Cai and Sarkar gave a method to compute generalized Simes critical constants which improve upon the power of the Simes test when more than a few hypotheses are false. The Simes constants can be viewed as the first order (requiring solution of a linear equation) and the Cai-Sarkar constants as the second order (requiring solution of a quadratic equation) constants. We extend the method to third order (requiring solution of a cubic equation) constants, and also offer an extension to an arbitrary kth order. We show by simulation that the third order constants are more powerful than the second order constants for testing the overall null hypothesis in most cases. However, there are some drawbacks associated with these higher order constants especially for k>3, which limits their practical usefulness.


Subject(s)
Statistics as Topic/methods
18.
Front Public Health ; 1: 63, 2013.
Article in English | MEDLINE | ID: mdl-24350232

ABSTRACT

We review and compare multiple hypothesis testing procedures used in clinical trials and those in genomic studies. Clinical trials often employ global tests, which draw an overall conclusion for all the hypotheses, such as SUM test, Two-Step test, Approximate Likelihood Ratio test (ALRT), Intersection-Union Test (IUT), and MAX test. The SUM and Two-Step tests are most powerful under homogeneous treatment effects, while the ALRT and MAX test are robust in cases with non-homogeneous treatment effects. Furthermore, the ALRT is robust to unequal sample sizes in testing different hypotheses. In genomic studies, stepwise procedures are used to draw marker-specific conclusions and control family wise error rate (FWER) or false discovery rate (FDR). FDR refers to the percent of false positives among all significant results and is preferred over FWER in screening high-dimensional genomic markers due to its interpretability. In cases where correlations between test statistics cannot be ignored, Westfall-Young resampling method generates the joint distribution of P-values under the null and maintains their correlation structure. Finally, the GWAS data from a clinical trial searching for SNPs associated with nephropathy among Type 1 diabetic patients are used to illustrate various procedures.

19.
Ann Stat ; 39(1): 556-583, 2011 Feb.
Article in English | MEDLINE | ID: mdl-25018568

ABSTRACT

Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the Sidák procedure for FWER control and the Benjamini-Hochberg (BH) procedure for FDR control is achieved by exploiting possible differences in the powers of the individual tests. Results signal the need to take into account the powers of the individual tests and to have multiple hypotheses decision functions which are not limited to simply using the individual p-values, as is the case, for example, with the Sidák, Bonferroni, or BH procedures. They also enhance understanding of the role of the powers of individual tests, or more precisely the receiver operating characteristic (ROC) functions of decision processes, in the search for better multiple hypotheses testing procedures. A decision-theoretic framework is utilized, and through auxiliary randomizers the procedures could be used with discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to existing p-value based procedures whose theoretical validity is contingent on each of these p-value statistics being stochastically equal to or greater than a standard uniform variable under the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional "large M, small n" data sets arising in the natural, physical, medical, economic and social sciences, whose generation and creation is accelerated by advances in high-throughput technology, notably, but not limited to, microarray technology.

SELECTION OF CITATIONS
SEARCH DETAIL