Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 56
Filter
Add more filters

Country/Region as subject
Publication year range
1.
J R Stat Soc Series B Stat Methodol ; 86(2): 411-434, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38746015

ABSTRACT

Mediation analysis aims to assess if, and how, a certain exposure influences an outcome of interest through intermediate variables. This problem has recently gained a surge of attention due to the tremendous need for such analyses in scientific fields. Testing for the mediation effect (ME) is greatly challenged by the fact that the underlying null hypothesis (i.e. the absence of MEs) is composite. Most existing mediation tests are overly conservative and thus underpowered. To overcome this significant methodological hurdle, we develop an adaptive bootstrap testing framework that can accommodate different types of composite null hypotheses in the mediation pathway analysis. Applied to the product of coefficients test and the joint significance test, our adaptive testing procedures provide type I error control under the composite null, resulting in much improved statistical power compared to existing tests. Both theoretical properties and numerical examples of the proposed methodology are discussed.

2.
J Stat Softw ; 1052023.
Article in English | MEDLINE | ID: mdl-38586564

ABSTRACT

Recurrent event analyses have found a wide range of applications in biomedicine, public health, and engineering, among others, where study subjects may experience a sequence of event of interest during follow-up. The R package reReg offers a comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, possibly with the presence of an informative terminal event. The regression framework is a general scale-change model which encompasses the popular Cox-type model, the accelerated rate model, and the accelerated mean model as special cases. Informative censoring is accommodated through a subject-specific frailty without any need for parametric specification. Different regression models are allowed for the recurrent event process and the terminal event. Also included are visualization and simulation tools.

3.
Multivariate Behav Res ; 58(2): 387-407, 2023.
Article in English | MEDLINE | ID: mdl-35086405

ABSTRACT

Differential item functioning (DIF) analysis refers to procedures that evaluate whether an item's characteristic differs for different groups of persons after controlling for overall differences in performance. DIF is routinely evaluated as a screening step to ensure items behave the same across groups. Currently, the majority DIF studies focus predominately on unidimensional IRT models, although multidimensional IRT (MIRT) models provide a powerful tool for enriching the information gained in modern assessment. In this study, we explore regularization methods for DIF detection in MIRT models and compare their performance to the classic likelihood ratio test. Regularization methods have recently emerged as a new family of methods for DIF detection due to their advantages: (1) they bypass the tedious iterative purification procedure that is often needed in other methods for identifying anchor items, and (2) they can handle multiple covariates simultaneously. The specific regularization methods considered in the study are: lasso with expectation-maximization (EM), lasso with expectation-maximization-maximization (EMM) algorithm, and adaptive lasso with EM. Simulation results show that lasso EMM and adaptive lasso EM hold great promise when the sample size is large, and they both outperform lasso EM. A real data example from PROMIS depression and anxiety scales is presented in the end.


Subject(s)
Algorithms , Likelihood Functions
4.
Biometrics ; 78(1): 261-273, 2022 03.
Article in English | MEDLINE | ID: mdl-33215683

ABSTRACT

A central but challenging problem in genetic studies is to test for (usually weak) associations between a complex trait (e.g., a disease status) and sets of multiple genetic variants. Due to the lack of a uniformly most powerful test, data-adaptive tests, such as the adaptive sum of powered score (aSPU) test, are advantageous in maintaining high power against a wide range of alternatives. However, there is often no closed-form to accurately and analytically calculate the p-values of many adaptive tests like aSPU, thus Monte Carlo (MC) simulations are often used, which can be time consuming to achieve a stringent significance level (e.g., 5e-8) used in genome-wide association studies (GWAS). To estimate such a small p-value, we need a huge number of MC simulations (e.g., 1e+10). As an alternative, we propose using importance sampling to speed up such calculations. We develop some theory to motivate a proposed algorithm for the aSPU test, and show that the proposed method is computationally more efficient than the standard MC simulations. Using both simulated and real data, we demonstrate the superior performance of the new method over the standard MC simulations.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Algorithms , Genome-Wide Association Study/methods , Monte Carlo Method
5.
Multivariate Behav Res ; 57(5): 840-858, 2022.
Article in English | MEDLINE | ID: mdl-33755507

ABSTRACT

Cognitive diagnosis models (CDMs) are useful statistical tools to provide rich information relevant for intervention and learning. As a popular approach to estimate and make inference of CDMs, the Markov chain Monte Carlo (MCMC) algorithm is widely used in practice. However, when the number of attributes, K, is large, the existing MCMC algorithm may become time-consuming, due to the fact that O(2K) calculations are usually needed in the process of MCMC sampling to get the conditional distribution for each attribute profile. To overcome this computational issue, motivated by Culpepper and Hudson's earlier work in 2018, we propose a computationally efficient sequential Gibbs sampling method, which needs O(K) calculations to sample each attribute profile. We use simulation and real data examples to show the good finite-sample performance of the proposed sequential Gibbs sampling, and its advantage over existing methods.


Subject(s)
Algorithms , Cognition , Bayes Theorem , Computer Simulation , Markov Chains , Monte Carlo Method
6.
Ann Stat ; 49(1): 154-181, 2021 Feb.
Article in English | MEDLINE | ID: mdl-34857975

ABSTRACT

Many high-dimensional hypothesis tests aim to globally examine marginal or low-dimensional features of a high-dimensional joint distribution, such as testing of mean vectors, covariance matrices and regression coefficients. This paper constructs a family of U-statistics as unbiased estimators of the ℓ p -norms of those features. We show that under the null hypothesis, the U-statistics of different finite orders are asymptotically independent and normally distributed. Moreover, they are also asymptotically independent with the maximum-type test statistic, whose limiting distribution is an extreme value distribution. Based on the asymptotic independence property, we propose an adaptive testing procedure which combines p-values computed from the U-statistics of different orders. We further establish power analysis results and show that the proposed adaptive procedure maintains high power against various alternatives.

7.
Stat Sin ; 30: 1773-1795, 2020.
Article in English | MEDLINE | ID: mdl-34385810

ABSTRACT

Two major challenges arise in regression analyses of recurrent event data: first, popular existing models, such as the Cox proportional rates model, may not fully capture the covariate effects on the underlying recurrent event process; second, the censoring time remains informative about the risk of experiencing recurrent events after accounting for covariates. We tackle both challenges by a general class of semiparametric scale-change models that allow a scale-change covariate effect as well as a multiplicative covariate effect. The proposed model is flexible and includes several existing models as special cases, such as the popular proportional rates model, the accelerated mean model, and the accelerated rate model. Moreover, it accommodates informative censoring through a subject-level latent frailty whose distribution is left unspecified. A robust estimation procedure which requires neither a parametric assumption on the distribution of the frailty nor a Poisson assumption on the recurrent event process is proposed to estimate the model parameters. The asymptotic properties of the resulting estimator are established, with the asymptotic variance estimated from a novel resampling approach. As a byproduct, the structure of the model provides a model selection approach among the submodels via hypothesis testing of model parameters. Numerical studies show that the proposed estimator and the model selection procedure perform well under both noninformative and informative censoring scenarios. The methods are applied to data from two transplant cohorts to study the risk of infections after transplantation.

8.
Int Stat Rev ; 87(1): 24-43, 2019 Apr.
Article in English | MEDLINE | ID: mdl-34366547

ABSTRACT

Panel count data arise in many applications when the event history of a recurrent event process is only examined at a sequence of discrete time points. In spite of the recent methodological developments, the availability of their software implementations has been rather limited. Focusing on a practical setting where the effects of some time-independent covariates on the recurrent events are of primary interest, we review semiparametric regression modelling approaches for panel count data that have been implemented in R package spef. The methods are grouped into two categories depending on whether the examination times are associated with the recurrent event process after conditioning on covariates. The reviewed methods are illustrated with a subset of the data from a skin cancer clinical trial.

9.
Genet Epidemiol ; 41(7): 599-609, 2017 11.
Article in English | MEDLINE | ID: mdl-28714590

ABSTRACT

Testing for association between two random vectors is a common and important task in many fields, however, existing tests, such as Escoufier's RV test, are suitable only for low-dimensional data, not for high-dimensional data. In moderate to high dimensions, it is necessary to consider sparse signals, which are often expected with only a few, but not many, variables associated with each other. We generalize the RV test to moderate-to-high dimensions. The key idea is to data adaptively weight each variable pair based on its empirical association. As the consequence, the proposed test is adaptive, alleviating the effects of noise accumulation in high-dimensional data, and thus maintaining the power for both dense and sparse alternative hypotheses. We show the connections between the proposed test with several existing tests, such as a generalized estimating equations-based adaptive test, multivariate kernel machine regression (KMR), and kernel distance methods. Furthermore, we modify the proposed adaptive test so that it can be powerful for nonlinear or nonmonotonic associations. We use both real data and simulated data to demonstrate the advantages and usefulness of the proposed new test. The new test is freely available in R package aSPC on CRAN at https://cran.r-project.org/web/packages/aSPC/index.html and https://github.com/jasonzyx/aSPC.


Subject(s)
Computational Biology/methods , Gene Expression Regulation , Models, Statistical , Polymorphism, Single Nucleotide , Software , Computer Simulation , Humans , Transcriptome
10.
Biometrics ; 74(3): 944-953, 2018 09.
Article in English | MEDLINE | ID: mdl-29286532

ABSTRACT

Panel count data arise when the number of recurrent events experienced by each subject is observed intermittently at discrete examination times. The examination time process can be informative about the underlying recurrent event process even after conditioning on covariates. We consider a semiparametric accelerated mean model for the recurrent event process and allow the two processes to be correlated through a shared frailty. The regression parameters have a simple marginal interpretation of modifying the time scale of the cumulative mean function of the event process. A novel estimation procedure for the regression parameters and the baseline rate function is proposed based on a conditioning technique. In contrast to existing methods, the proposed method is robust in the sense that it requires neither the strong Poisson-type assumption for the underlying recurrent event process nor a parametric assumption on the distribution of the unobserved frailty. Moreover, the distribution of the examination time process is left unspecified, allowing for arbitrary dependence between the two processes. Asymptotic consistency of the estimator is established, and the variance of the estimator is estimated by a model-based smoothed bootstrap procedure. Numerical studies demonstrated that the proposed point estimator and variance estimator perform well with practical sample sizes. The methods are applied to data from a skin cancer chemoprevention trial.


Subject(s)
Statistics as Topic/methods , Time Factors , Chemoprevention/methods , Chemoprevention/statistics & numerical data , Clinical Trials as Topic , Computer Simulation , Recurrence , Regression Analysis , Sample Size , Skin Neoplasms/prevention & control
11.
Stat Med ; 37(6): 996-1008, 2018 03 15.
Article in English | MEDLINE | ID: mdl-29171035

ABSTRACT

Alternating recurrent event data arise frequently in clinical and epidemiologic studies, where 2 types of events such as hospital admission and discharge occur alternately over time. The 2 alternating states defined by these recurrent events could each carry important and distinct information about a patient's underlying health condition and/or the quality of care. In this paper, we propose a semiparametric method for evaluating covariate effects on the 2 alternating states jointly. The proposed methodology accounts for the dependence among the alternating states as well as the heterogeneity across patients via a frailty with unspecified distribution. Moreover, the estimation procedure, which is based on smooth estimating equations, not only properly addresses challenges such as induced dependent censoring and intercept sampling bias commonly confronted in serial event gap time data but also is more computationally tractable than the existing rank-based methods. The proposed methods are evaluated by simulation studies and illustrated by analyzing psychiatric contacts from the South Verona Psychiatric Case Register.


Subject(s)
Biometry/methods , Regression Analysis , Adolescent , Adult , Aged , Aged, 80 and over , Computer Simulation , Female , Hospitalization , Humans , Italy , Male , Mental Disorders , Middle Aged , Recurrence , Registries , Risk Factors , Time Factors , Young Adult
12.
Stat Med ; 37(7): 1086-1100, 2018 03 30.
Article in English | MEDLINE | ID: mdl-29205446

ABSTRACT

Various semiparametric regression models have recently been proposed for the analysis of gap times between consecutive recurrent events. Among them, the semiparametric accelerated failure time (AFT) model is especially appealing owing to its direct interpretation of covariate effects on the gap times. In general, estimation of the semiparametric AFT model is challenging because the rank-based estimating function is a nonsmooth step function. As a result, solutions to the estimating equations do not necessarily exist. Moreover, the popular resampling-based variance estimation for the AFT model requires solving rank-based estimating equations repeatedly and hence can be computationally cumbersome and unstable. In this paper, we extend the induced smoothing approach to the AFT model for recurrent gap time data. Our proposed smooth estimating function permits the application of standard numerical methods for both the regression coefficients estimation and the standard error estimation. Large-sample properties and an asymptotic variance estimator are provided for the proposed method. Simulation studies show that the proposed method outperforms the existing nonsmooth rank-based estimating function methods in both point estimation and variance estimation. The proposed method is applied to the data analysis of repeated hospitalizations for patients in the Danish Psychiatric Center Register.


Subject(s)
Biometry/methods , Recurrence , Regression Analysis , Computer Simulation , Denmark , Hospitalization , Humans , Mental Disorders , Patient Readmission , Registries , Time Factors
13.
Appl Psychol Meas ; 41(8): 579-599, 2017.
Article in English | MEDLINE | ID: mdl-29033476

ABSTRACT

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire.

15.
Psychometrika ; 89(2): 717-740, 2024 06.
Article in English | MEDLINE | ID: mdl-38517594

ABSTRACT

Cognitive diagnosis models (CDMs) provide a powerful statistical and psychometric tool for researchers and practitioners to learn fine-grained diagnostic information about respondents' latent attributes. There has been a growing interest in the use of CDMs for polytomous response data, as more and more items with multiple response options become widely used. Similar to many latent variable models, the identifiability of CDMs is critical for accurate parameter estimation and valid statistical inference. However, the existing identifiability results are primarily focused on binary response models and have not adequately addressed the identifiability of CDMs with polytomous responses. This paper addresses this gap by presenting sufficient and necessary conditions for the identifiability of the widely used DINA model with polytomous responses, with the aim to provide a comprehensive understanding of the identifiability of CDMs with polytomous responses and to inform future research in this field.


Subject(s)
Models, Statistical , Psychometrics , Humans , Psychometrics/methods , Cognition
16.
Appl Psychol Meas ; 48(6): 276-294, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39166181

ABSTRACT

Accurate item parameters and standard errors (SEs) are crucial for many multidimensional item response theory (MIRT) applications. A recent study proposed the Gaussian Variational Expectation Maximization (GVEM) algorithm to improve computational efficiency and estimation accuracy (Cho et al., 2021). However, the SE estimation procedure has yet to be fully addressed. To tackle this issue, the present study proposed an updated supplemented expectation maximization (USEM) method and a bootstrap method for SE estimation. These two methods were compared in terms of SE recovery accuracy. The simulation results demonstrated that the GVEM algorithm with bootstrap and item priors (GVEM-BSP) outperformed the other methods, exhibiting less bias and relative bias for SE estimates under most conditions. Although the GVEM with USEM (GVEM-USEM) was the most computationally efficient method, it yielded an upward bias for SE estimates.

17.
Psychometrika ; 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38429494

ABSTRACT

Multidimensional item response theory (MIRT) models have generated increasing interest in the psychometrics literature. Efficient approaches for estimating MIRT models with dichotomous responses have been developed, but constructing an equally efficient and robust algorithm for polytomous models has received limited attention. To address this gap, this paper presents a novel Gaussian variational estimation algorithm for the multidimensional generalized partial credit model. The proposed algorithm demonstrates both fast and accurate performance, as illustrated through a series of simulation studies and two real data analyses.

18.
Psychometrika ; 2024 May 30.
Article in English | MEDLINE | ID: mdl-38814412

ABSTRACT

With the growing attention on large-scale educational testing and assessment, the ability to process substantial volumes of response data becomes crucial. Current estimation methods within item response theory (IRT), despite their high precision, often pose considerable computational burdens with large-scale data, leading to reduced computational speed. This study introduces a novel "divide- and-conquer" parallel algorithm built on the Wasserstein posterior approximation concept, aiming to enhance computational speed while maintaining accurate parameter estimation. This algorithm enables drawing parameters from segmented data subsets in parallel, followed by an amalgamation of these parameters via Wasserstein posterior approximation. Theoretical support for the algorithm is established through asymptotic optimality under certain regularity assumptions. Practical validation is demonstrated using real-world data from the Programme for International Student Assessment. Ultimately, this research proposes a transformative approach to managing educational big data, offering a scalable, efficient, and precise alternative that promises to redefine traditional practices in educational assessments.

19.
Stat Sin ; 232013.
Article in English | MEDLINE | ID: mdl-24307816

ABSTRACT

Sellke and Siegmund (1983) developed the Brownian approximation to the Cox partial likelihood score as a process of calendar time, laying the foundation for group sequential analysis of survival studies. We extend their results to cover situations in which treatment allocations may depend on observed outcomes. The new development makes use of the entry time and calendar time along with the corresponding σ-filtrations to handle the natural information accumulation. Large sample properties are established under suitable regularity conditions.

20.
Bernoulli (Andover) ; 19(5A): 1790-1817, 2013 Nov 01.
Article in English | MEDLINE | ID: mdl-24812537

ABSTRACT

Cognitive assessment is a growing area in psychological and educational measurement, where tests are given to assess mastery/deficiency of attributes or skills. A key issue is the correct identification of attributes associated with items in a test. In this paper, we set up a mathematical framework under which theoretical properties may be discussed. We establish sufficient conditions to ensure that the attributes required by each item are learnable from the data.

SELECTION OF CITATIONS
SEARCH DETAIL