Search | VHL Search Portal

1.

High-dimensional covariate-augmented overdispersed poisson factor model.

Liu, Wei; Zhong, Qingzhi.

Biometrics ; 80(2)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38682464

ABSTRACT

The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study focuses on high dimensional settings, where the number of the count response variables and/or covariates can diverge as the sample size increases. A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. A group of identifiability conditions is provided to theoretically guarantee computational identifiability. We incorporate the interdependence of both response variables and covariates by imposing a low-rank constraint on the large coefficient matrix. To address the computation challenges posed by nonlinearity, two high-dimensional latent matrices, and the low-rank constraint, we propose a novel variational estimation scheme that combines Laplace and Taylor approximations. We also develop a criterion based on a singular value ratio to determine the number of factors and the rank of the coefficient matrix. Comprehensive simulation studies demonstrate that the proposed method outperforms the state-of-the-art methods in estimation accuracy and computational efficiency. The practical merit of our method is demonstrated by an application to the CITE-seq dataset. A flexible implementation of our proposed method is available in the R package COAP.

Subject(s)

Computer Simulation , Models, Statistical , Poisson Distribution , Humans , Sample Size , Biometry/methods , Factor Analysis, Statistical

2.

Conditional modeling of panel count data with partly interval-censored failure event.

Hu, Xiangbin; Su, Wen; Ye, Zhisheng; Zhao, Xingqiu.

Biometrics ; 80(1)2024 Jan 29.

Article in English | MEDLINE | ID: mdl-38497823

ABSTRACT

In longitudinal follow-up studies, panel count data arise from discrete observations on recurrent events. We investigate a more general situation where a partly interval-censored failure event is informative to recurrent events. The existing methods for the informative failure event are based on the latent variable model, which provides indirect interpretation for the effect of failure event. To solve this problem, we propose a failure-time-dependent proportional mean model with panel count data through an unspecified link function. For estimation of model parameters, we consider a conditional expectation of least squares function to overcome the challenges from partly interval-censoring, and develop a two-stage estimation procedure by treating the distribution function of the failure time as a functional nuisance parameter and using the B-spline functions to approximate unknown baseline mean and link functions. Furthermore, we derive the overall convergence rate of the proposed estimators and establish the asymptotic normality of finite-dimensional estimator and functionals of infinite-dimensional estimator. The proposed estimation procedure is evaluated by extensive simulation studies, in which the finite-sample performances coincide with the theoretical results. We further illustrate our method with a longitudinal healthy longevity study and draw some insightful conclusions.

Subject(s)

Health Status , Computer Simulation

3.

An interpretable Bayesian clustering approach with feature selection for analyzing spatially resolved transcriptomics data.

Li, Huimin; Zhu, Bencong; Jiang, Xi; Guo, Lei; Xie, Yang; Xu, Lin; Li, Qiwei.

Biometrics ; 80(3)2024 Jul 01.

Article in English | MEDLINE | ID: mdl-39073775

ABSTRACT

Recent breakthroughs in spatially resolved transcriptomics (SRT) technologies have enabled comprehensive molecular characterization at the spot or cellular level while preserving spatial information. Cells are the fundamental building blocks of tissues, organized into distinct yet connected components. Although many non-spatial and spatial clustering approaches have been used to partition the entire region into mutually exclusive spatial domains based on the SRT high-dimensional molecular profile, most require an ad hoc selection of less interpretable dimensional-reduction techniques. To overcome this challenge, we propose a zero-inflated negative binomial mixture model to cluster spots or cells based on their molecular profiles. To increase interpretability, we employ a feature selection mechanism to provide a low-dimensional summary of the SRT molecular profile in terms of discriminating genes that shed light on the clustering result. We further incorporate the SRT geospatial profile via a Markov random field prior. We demonstrate how this joint modeling strategy improves clustering accuracy, compared with alternative state-of-the-art approaches, through simulation studies and 3 real data applications.

Subject(s)

Bayes Theorem , Computer Simulation , Gene Expression Profiling , Cluster Analysis , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Humans , Transcriptome , Markov Chains , Models, Statistical , Data Interpretation, Statistical

4.

Simultaneous variable selection and estimation in semiparametric regression of mixed panel count data.

Ge, Lei; Hu, Tao; Li, Yang.

Biometrics ; 80(1)2024 Jan 29.

Article in English | MEDLINE | ID: mdl-38465988

ABSTRACT

Mixed panel count data represent a common complex data structure in longitudinal survey studies. A major challenge in analyzing such data is variable selection and estimation while efficiently incorporating both the panel count and panel binary data components. Analyses in the medical literature have often ignored the panel binary component and treated it as missing with the unknown panel counts, while obviously such a simplification does not effectively utilize the original data information. In this research, we put forward a penalized likelihood variable selection and estimation procedure under the proportional mean model. A computationally efficient EM algorithm is developed that ensures sparse estimation for variable selection, and the resulting estimator is shown to have the desirable oracle property. Simulation studies assessed and confirmed the good finite-sample properties of the proposed method, and the method is applied to analyze a motivating dataset from the Health and Retirement Study.

Subject(s)

Algorithms , Likelihood Functions , Computer Simulation , Longitudinal Studies

5.

Socio-environmental predictors of diabetes incidence disparities in Tanzania mainland: a comparison of regression models for count data.

Mbwambo, Sauda Hatibu; Mbago, Maurice C; Rao, Gadde Srinivasa.

BMC Med Res Methodol ; 24(1): 75, 2024 Mar 26.

Article in English | MEDLINE | ID: mdl-38532325

ABSTRACT

BACKGROUND: Diabetes is one of the top four non-communicable diseases that cause death and illness to many people around the world. This study aims to use an efficient count data model to estimate socio-environmental factors associated with diabetes incidences in Tanzania mainland, addressing lack of evidence on the efficient count data model for estimating factors associated with disease incidences disparities. METHODS: This study analyzed diabetes counts in 184 Tanzania mainland councils collected in 2020. The study applied generalized Poisson, negative binomial, and Poisson count data models and evaluated their adequacy using information criteria and Pearson chi-square values. RESULTS: The data were over-dispersed, as evidenced by the mean and variance values and the positively skewed histograms. The results revealed uneven distribution of diabetes incidence across geographical locations, with northern and urban councils having more cases. Factors like population, GDP, and hospital numbers were associated with diabetes counts. The GP model performed better than NB and Poisson models. CONCLUSION: The occurrence of diabetes can be attributed to geographical locations. To address this public health issue, environmental interventions can be implemented. Additionally, the generalized Poisson model is an effective tool for analyzing health information system count data across different population subgroups.

Subject(s)

Diabetes Mellitus , Models, Statistical , Humans , Incidence , Tanzania , Poisson Distribution

6.

Large-scale data decipher children's scale errors: A meta-analytic approach using the zero-inflated Poisson models.

Hagihara, Hiromichi; Ishibashi, Mikako; Moriguchi, Yusuke; Shinya, Yuta.

Dev Sci ; 27(4): e13499, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38544371

ABSTRACT

Scale errors are intriguing phenomena in which a child tries to perform an object-specific action on a tiny object. Several viewpoints explaining the developmental mechanisms underlying scale errors exist; however, there is no unified account of how different factors interact and affect scale errors, and the statistical approaches used in the previous research do not adequately capture the structure of the data. By conducting a secondary analysis of aggregated datasets across nine different studies (n = 528) and using more appropriate statistical methods, this study provides a more accurate description of the development of scale errors. We implemented the zero-inflated Poisson (ZIP) regression that could directly handle the count data with a stack of zero observations and regarded developmental indices as continuous variables. The results suggested that the developmental trend of scale errors was well documented by an inverted U-shaped curve rather than a simple linear function, although nonlinearity captured different aspects of the scale errors between the laboratory and classroom data. We also found that repeated experiences with scale error tasks reduced the number of scale errors, whereas girls made more scale errors than boys. Furthermore, a model comparison approach revealed that predicate vocabulary size (e.g., adjectives or verbs), predicted developmental changes in scale errors better than noun vocabulary size, particularly in terms of the presence or absence of scale errors. The application of the ZIP model enables researchers to discern how different factors affect scale error production, thereby providing new insights into demystifying the mechanisms underlying these phenomena. A video abstract of this article can be viewed at https://youtu.be/1v1U6CjDZ1Q RESEARCH HIGHLIGHTS: We fit a large dataset by aggregating the existing scale error data to the zero-inflated Poisson (ZIP) model. Scale errors peaked along the different developmental indices, but the underlying statistical structure differed between the in-lab and classroom datasets. Repeated experiences with scale error tasks and the children's gender affected the number of scale errors produced per session. Predicate vocabulary size (e.g., adjectives or verbs) better predicts developmental changes in scale errors than noun vocabulary size.

Subject(s)

Vocabulary , Humans , Poisson Distribution , Child , Female , Male , Child Development/physiology , Child, Preschool , Models, Statistical

7.

A novel correction method for modelling parameter-driven autocorrelated time series with count outcome.

Xu, Xiao-Han; Zhan, Zi-Shu; Shi, Chen; Xiao, Ting; Ou, Chun-Quan.

BMC Public Health ; 24(1): 901, 2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38539086

ABSTRACT

BACKGROUND: Count time series (e.g., daily deaths) are a very common type of data in environmental health research. The series is generally autocorrelated, while the widely used generalized linear model is based on the assumption of independent outcomes. None of the existing methods for modelling parameter-driven count time series can obtain consistent and reliable standard error of parameter estimates, causing potential inflation of type I error rate. METHODS: We proposed a new maximum significant ρ correction (MSRC) method that utilizes information of significant autocorrelation coefficient ρ estimate within 5 orders by moment estimation. A Monte Carlo simulation was conducted to evaluate and compare the finite sample performance of the MSRC and classical unbiased correction (UB-corrected) method. We demonstrated a real-data analysis for assessing the effect of drunk driving regulations on the incidence of road traffic injuries (RTIs) using MSRC in Shenzhen, China. Moreover, there is no previous paper assessing the time-varying intervention effect and considering autocorrelation based on daily data of RTIs. RESULTS: Both methods had a small bias in the regression coefficients. The autocorrelation coefficient estimated by UB-corrected is slightly underestimated at high autocorrelation (≥ 0.6), leading to the inflation of the type I error rate. The new method well controlled the type I error rate when the sample size reached 340. Moreover, the power of MSRC increased with increasing sample size and effect size and decreasing nuisance parameters, and it approached UB-corrected when ρ was small (≤ 0.4), but became more reliable as autocorrelation increased further. The daily data of RTIs exhibited significant autocorrelation after controlling for potential confounding, and therefore the MSRC was preferable to the UB-corrected. The intervention contributed to a decrease in the incidence of RTIs by 8.34% (95% CI, -5.69-20.51%), 45.07% (95% CI, 25.86-59.30%) and 42.94% (95% CI, 9.56-64.00%) at 1, 3 and 5 years after the implementation of the intervention, respectively. CONCLUSIONS: The proposed MSRC method provides a reliable and consistent approach for modelling parameter-driven time series with autocorrelated count data. It offers improved estimation compared to existing methods. The strict drunk driving regulations can reduce the risk of RTIs.

Subject(s)

Time Factors , Humans , Linear Models , Computer Simulation , Bias , China

8.

Importance of temporary and permanent snow for new second homes.

Falk, Martin Thomas; Hagsten, Eva; Lin, Xiang.

Int J Biometeorol ; 68(3): 581-593, 2024 Mar.

Article in English | MEDLINE | ID: mdl-36607447

ABSTRACT

This study investigates empirically how natural snow depth and permanent snow affect the number of new second homes in Norway. One out of four Norwegian municipalities is partly covered by glaciers and permanent snow. In the winter seasons of 1983-2020, there is a decline in snow depth from 50 to 35 cm on average (based on 41 popular second-home areas in the mountains). Results of the fixed effects Poisson estimator with spatial elements show that there is a significant and positive relationship between natural snow depth in the municipality and the number of second homes started. There is also a significant and negative relationship between the number of new second homes in the municipality and a scarcity of snow in the surrounding municipalities. However, the magnitude of both effects is small. Estimates also show a strong positive relationship between the proportion of surface covered by permanent snow or glaciers in the municipality and new second homes. This implies that a decline in permanent snow and glaciers may make these areas less attractive for the location of second homes.

Subject(s)

Environmental Monitoring , Snow , Environmental Monitoring/methods , Seasons , Ice Cover

9.

Understanding Ability and Reliability Differences Measured with Count Items: The Distributional Regression Test Model and the Count Latent Regression Model.

Beisemann, Marie; Forthmann, Boris; Doebler, Philipp.

Multivariate Behav Res ; 59(3): 502-522, 2024.

Article in English | MEDLINE | ID: mdl-38348679

ABSTRACT

In psychology and education, tests (e.g., reading tests) and self-reports (e.g., clinical questionnaires) generate counts, but corresponding Item Response Theory (IRT) methods are underdeveloped compared to binary data. Recent advances include the Two-Parameter Conway-Maxwell-Poisson model (2PCMPM), generalizing Rasch's Poisson Counts Model, with item-specific difficulty, discrimination, and dispersion parameters. Explaining differences in model parameters informs item construction and selection but has received little attention. We introduce two 2PCMPM-based explanatory count IRT models: The Distributional Regression Test Model for item covariates, and the Count Latent Regression Model for (categorical) person covariates. Estimation methods are provided and satisfactory statistical properties are observed in simulations. Two examples illustrate how the models help understand tests and underlying constructs.

Subject(s)

Models, Statistical , Humans , Regression Analysis , Reproducibility of Results , Computer Simulation/statistics & numerical data , Poisson Distribution , Psychometrics/methods , Data Interpretation, Statistical

10.

Counting the days: Exploring the post-mortem interval factors in sexual homicides.

Chai, April Miin Miin; Reale, Kylie S.

Behav Sci Law ; 42(4): 385-400, 2024.

Article in English | MEDLINE | ID: mdl-38762888

ABSTRACT

This study explores the offender, victim, and environmental characteristics that significantly influence the number of days a sexual homicide victim remains undiscovered. Utilizing a sample of 269 cases from the Homicide Investigation Tracking System database an in-depth analysis was conducted to unveil the factors contributing to the delay in the discovery of victims' bodies. The methodological approach involves applying a negative binomial regression analysis, which allows for the examination of count data, specifically addressing the over-dispersion and excess zeros in the dependent variable - the number of days until the victim is found. The findings reveal that certain offender characteristics, victim traits, and spatio-temporal factors play a pivotal role in the time lag experienced in locating the bodies of homicide victims. These findings have crucial implications for investigative efforts in homicide cases, offering valuable insights that can inform and enhance the efficacy and efficiency of future investigative procedures and strategies.

Subject(s)

Crime Victims , Homicide , Sex Offenses , Humans , Male , Female , Adult , Sex Offenses/psychology , Criminals/psychology , Middle Aged , Time Factors , Young Adult , Adolescent , Aged , Autopsy

11.

Mapping QTL controlling count traits with excess zeros and ones using a zero-and-one-inflated generalized Poisson regression model.

Chi, Jinling; Ye, Jimin; Zhou, Ying.

Biom J ; 66(3): e2200342, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38616336

ABSTRACT

The research on the quantitative trait locus (QTL) mapping of count data has aroused the wide attention of researchers. There are frequent problems in applied research that limit the application of the conventional Poisson model in the analysis of count phenotypes, which include the overdispersion and excess zeros and ones. In this article, a novel model, that is, the zero-and-one-inflated generalized Poisson (ZOIGP) model, is proposed to deal with these problems. Based on the proposed model, a score test is performed for the inflation parameter, in which the ZOIGP model with a constant proportion of excess zeros and ones is compared with a standard generalized Poisson model. To illustrate the practicability of the ZOIGP model, we extend it to the QTL interval mapping application that underpins count phenotype with excess zeros and excess ones. The genetic effects are estimated utilizing the expectation-maximization algorithm embedded with the Newton-Raphson algorithm, and the genome-wide scan and likelihood ratio test is performed to map and test the potential QTLs. The statistical properties exhibited by the proposed method are investigated through simulation. Finally, a real data analysis example is used to illustrate the utility of the proposed method for QTL mapping.

Subject(s)

Algorithms , Quantitative Trait Loci , Computer Simulation , Data Analysis , Phenotype

12.

A flexible time-varying coefficient rate model for panel count data.

Sun, Dayu; Guo, Yuanyuan; Li, Yang; Sun, Jianguo; Tu, Wanzhu.

Lifetime Data Anal ; 2024 May 28.

Article in English | MEDLINE | ID: mdl-38805094

ABSTRACT

Panel count regression is often required in recurrent event studies, where the interest is to model the event rate. Existing rate models are unable to handle time-varying covariate effects due to theoretical and computational difficulties. Mean models provide a viable alternative but are subject to the constraints of the monotonicity assumption, which tends to be violated when covariates fluctuate over time. In this paper, we present a new semiparametric rate model for panel count data along with related theoretical results. For model fitting, we present an efficient EM algorithm with three different methods for variance estimation. The algorithm allows us to sidestep the challenges of numerical integration and difficulties with the iterative convex minorant algorithm. We showed that the estimators are consistent and asymptotically normally distributed. Simulation studies confirmed an excellent finite sample performance. To illustrate, we analyzed data from a real clinical study of behavioral risk factors for sexually transmitted infections.

13.

Multilevel modeling in single-case studies with zero-inflated and overdispersed count data.

Li, Haoran; Luo, Wen; Baek, Eunkyeng.

Behav Res Methods ; 56(4): 2765-2781, 2024 04.

Article in English | MEDLINE | ID: mdl-38383801

ABSTRACT

Count outcomes are frequently encountered in single-case experimental designs (SCEDs). Generalized linear mixed models (GLMMs) have shown promise in handling overdispersed count data. However, the presence of excessive zeros in the baseline phase of SCEDs introduces a more complex issue known as zero-inflation, often overlooked by researchers. This study aimed to deal with zero-inflated and overdispersed count data within a multiple-baseline design (MBD) in single-case studies. It examined the performance of various GLMMs (Poisson, negative binomial [NB], zero-inflated Poisson [ZIP], and zero-inflated negative binomial [ZINB] models) in estimating treatment effects and generating inferential statistics. Additionally, a real example was used to demonstrate the analysis of zero-inflated and overdispersed count data. The simulation results indicated that the ZINB model provided accurate estimates for treatment effects, while the other three models yielded biased estimates. The inferential statistics obtained from the ZINB model were reliable when the baseline rate was low. However, when the data were overdispersed but not zero-inflated, both the ZINB and ZIP models exhibited poor performance in accurately estimating treatment effects. These findings contribute to our understanding of using GLMMs to handle zero-inflated and overdispersed count data in SCEDs. The implications, limitations, and future research directions are also discussed.

Subject(s)

Single-Case Studies as Topic , Humans , Linear Models , Multilevel Analysis/methods , Data Interpretation, Statistical , Models, Statistical , Poisson Distribution , Computer Simulation , Research Design

14.

Model selection of GLMMs in the analysis of count data in single-case studies: A Monte Carlo simulation.

Li, Haoran.

Behav Res Methods ; 56(7): 7963-7984, 2024 10.

Article in English | MEDLINE | ID: mdl-38987450

ABSTRACT

Generalized linear mixed models (GLMMs) have great potential to deal with count data in single-case experimental designs (SCEDs). However, applied researchers have faced challenges in making various statistical decisions when using such advanced statistical techniques in their own research. This study focused on a critical issue by investigating the selection of an appropriate distribution to handle different types of count data in SCEDs due to overdispersion and/or zero-inflation. To achieve this, I proposed two model selection frameworks, one based on calculating information criteria (AIC and BIC) and another based on utilizing a multistage-model selection procedure. Four data scenarios were simulated including Poisson, negative binominal (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB). The same set of models (i.e., Poisson, NB, ZIP, and ZINB) were fitted for each scenario. In the simulation, I evaluated 10 model selection strategies within the two frameworks by assessing the model selection bias and its consequences on the accuracy of the treatment effect estimates and inferential statistics. Based on the simulation results and previous work, I provide recommendations regarding which model selection methods should be adopted in different scenarios. The implications, limitations, and future research directions are also discussed.

Subject(s)

Monte Carlo Method , Linear Models , Humans , Single-Case Studies as Topic , Computer Simulation , Data Interpretation, Statistical , Models, Statistical , Poisson Distribution , Research Design

15.

Application of a count data model to evaluate the anti-metastatic efficacy of QAP14 in 4T1 breast cancer allografts.

Guo, Yuchen; Yong, Ling; Yao, Qingyu; Han, Mengyi; Xue, Junsheng; Jian, Weizhe; Zhou, Tianyan.

J Theor Biol ; 557: 111323, 2023 01 21.

Article in English | MEDLINE | ID: mdl-36273592

ABSTRACT

Dopamine D1 receptor (D1DR) is proved to be a promising target to prevent tumor metastasis, and our previous studies showed that QAP14, a potent anti-cancer agent, exerted inhibitory effect on lung metastasis via D1DR activation. Therefore, the purpose of the study was to establish count data models to quantitatively characterize the disease progression of lung metastasis and assess the anti-metastatic effect of QAP14. Data of metastatic progression were collected in 4T1 tumor-bearing mice. Generalized Poisson distribution best described the variability of metastasis counts among the individuals. An empirical PK/PD model was developed to establish mathematical relationships between steady plasma concentrations of QAP14 and metastasis growth dynamics. The latency period of metastasis was estimated to be 12 days after tumor implantation. Our model structure also fitted well to other D1DR agonists (fenoldopam and l-stepholidine) which had inhibitory impact on breast cancer lung metastasis likewise. QAP14 40 mg/kg showed the best inhibitory efficacy, for it provided the longest prolongation of metastasis-free periods compared with fenoldopam or l-stepholidine. This study provides a quantitative method to describe the lung metastasis progression of 4T1 allografts, as well as an alternative PD model structure to evaluate anti-metastatic efficacy.

Subject(s)

Fenoldopam , Lung Neoplasms , Mice , Animals , Cell Line, Tumor , Lung Neoplasms/drug therapy , Lung Neoplasms/pathology , Allografts/pathology , Mice, Inbred BALB C , Neoplasm Metastasis/pathology

16.

A general modeling framework for open wildlife populations based on the Polya tree prior.

Diana, Alex; Matechou, Eleni; Griffin, Jim; Arnold, Todd; Tenan, Simone; Volponi, Stefano.

Biometrics ; 79(3): 2171-2183, 2023 09.

Article in English | MEDLINE | ID: mdl-36065934

ABSTRACT

Wildlife monitoring for open populations can be performed using a number of different survey methods. Each survey method gives rise to a type of data and, in the last five decades, a large number of associated statistical models have been developed for analyzing these data. Although these models have been parameterized and fitted using different approaches, they have all been designed to either model the pattern with which individuals enter and/or exit the population, or to estimate the population size by accounting for the corresponding observation process, or both. However, existing approaches rely on a predefined model structure and complexity, either by assuming that parameters linked to the entry and exit pattern (EEP) are specific to sampling occasions, or by employing parametric curves to describe the EEP. Instead, we propose a novel Bayesian nonparametric framework for modeling EEPs based on the Polya tree (PT) prior for densities. Our Bayesian nonparametric approach avoids overfitting when inferring EEPs, while simultaneously allowing more flexibility than is possible using parametric curves. Finally, we introduce the replicate PT prior for defining classes of models for these data allowing us to impose constraints on the EEPs, when required. We demonstrate our new approach using capture-recapture, count, and ring-recovery data for two different case studies.

Subject(s)

Animals, Wild , Models, Statistical , Humans , Animals , Bayes Theorem , Population Density

17.

Nonparametric scanning tests of homogeneity for hierarchical models with continuous covariates.

Todem, David; Hsu, Wei-Wen; Kim, KyungMann.

Biometrics ; 79(3): 2063-2075, 2023 09.

Article in English | MEDLINE | ID: mdl-36454666

ABSTRACT

In many applications of hierarchical models, there is often interest in evaluating the inherent heterogeneity in view of observed data. When the underlying hypothesis involves parameters resting on the boundary of their support space such as variances and mixture proportions, it is a usual practice to entertain testing procedures that rely on common heterogeneity assumptions. Such procedures, albeit omnibus for general alternatives, may entail a substantial loss of power for specific alternatives such as heterogeneity varying with covariates. We introduce a novel and flexible approach that uses covariate information to improve the power to detect heterogeneity, without imposing unnecessary restrictions. With continuous covariates, the approach does not impose a regression model relating heterogeneity parameters to covariates or rely on arbitrary discretizations. Instead, a scanning approach requiring continuous dichotomizations of the covariates is proposed. Empirical processes resulting from these dichotomizations are then used to construct the test statistics, with limiting null distributions shown to be functionals of tight random processes. We illustrate our proposals and results on a popular class of two-component mixture models, followed by simulation studies and applications to two real datasets in cancer and caries research.

Subject(s)

Models, Statistical , Research Design , Computer Simulation , Causality , Correlation of Data

18.

Semiparametric estimation and testing for panel count data with informative interval-censored failure event.

Liu, Li; Su, Wen; Zhao, Xingqiu.

Stat Med ; 42(30): 5596-5615, 2023 12 30.

Article in English | MEDLINE | ID: mdl-37867199

ABSTRACT

Panel count data and interval-censored data are two types of incomplete data that often occur in event history studies. Almost all existing statistical methods are developed for their separate analysis. In this paper, we investigate a more general situation where a recurrent event process and an interval-censored failure event occur together. To intuitively and clearly explain the relationship between the recurrent current process and failure event, we propose a failure time-dependent mean model through a completely unspecified link function. To overcome the challenges arising from the blending of nonparametric components and parametric regression coefficients, we develop a two-stage conditional expected likelihood-based estimation procedure. We establish the consistency, the convergence rate and the asymptotic normality of the proposed two-stage estimator. Furthermore, we construct a class of two-sample tests for comparison of mean functions from different groups. The proposed methods are evaluated by extensive simulation studies and are illustrated with the skin cancer data that motivated this study.

Subject(s)

Skin Neoplasms , Humans , Likelihood Functions , Regression Analysis , Computer Simulation , Time

19.

Modelling count, bounded and skewed continuous outcomes in physical activity research: beyond linear regression models.

Akram, Muhammad; Cerin, Ester; Lamb, Karen E; White, Simon R.

Int J Behav Nutr Phys Act ; 20(1): 57, 2023 05 05.

Article in English | MEDLINE | ID: mdl-37147664

ABSTRACT

BACKGROUND: Inference using standard linear regression models (LMs) relies on assumptions that are rarely satisfied in practice. Substantial departures, if not addressed, have serious impacts on any inference and conclusions; potentially rendering them invalid and misleading. Count, bounded and skewed outcomes, common in physical activity research, can substantially violate LM assumptions. A common approach to handle these is to transform the outcome and apply a LM. However, a transformation may not suffice. METHODS: In this paper, we introduce the generalized linear model (GLM), a generalization of the LM, as an approach for the appropriate modelling of count and non-normally distributed (i.e., bounded and skewed) outcomes. Using data from a study of physical activity among older adults, we demonstrate appropriate methods to analyse count, bounded and skewed outcomes. RESULTS: We show how fitting an LM when inappropriate, especially for the type of outcomes commonly encountered in physical activity research, substantially impacts the analysis, inference, and conclusions compared to a GLM. CONCLUSIONS: GLMs which more appropriately model non-normally distributed response variables should be considered as more suitable approaches for managing count, bounded and skewed outcomes rather than simply relying on transformations. We recommend that physical activity researchers add the GLM to their statistical toolboxes and become aware of situations when GLMs are a better method than traditional approaches for modeling count, bounded and skewed outcomes.

Subject(s)

Exercise , Aged , Humans , Linear Models

20.

Bayesian areal disaggregation regression to predict wildlife distribution and relative density with low-resolution data.

Murphy, Kilian J; Ciuti, Simone; Burkitt, Tim; Morera-Pujol, Virginia.

Ecol Appl ; 33(8): e2924, 2023 Dec.

Article in English | MEDLINE | ID: mdl-37804526

ABSTRACT

For species of conservation concern and human-wildlife conflict, it is imperative that spatial population data be available to design adaptive-management strategies and be prepared to meet challenges such as land use and climate change, disease outbreaks, and invasive species spread. This can be difficult, perhaps impossible, if spatially explicit wildlife data are not available. Low-resolution areal counts, however, are common in wildlife monitoring, that is, the number of animals reported for a region, usually corresponding to administrative subdivisions, for example, region, province, county, departments, or cantons. Bayesian areal disaggregation regression is a solution to exploit areal counts and provide conservation biologists with high-resolution species distribution predictive models. This method originated in epidemiology but lacks experimentation in ecology. It provides a plethora of applications to change the way we collect and analyze data for wildlife populations. Based on high-resolution environmental rasters, the disaggregation method disaggregates the number of individuals observed in a region and distributes them at the pixel level (e.g., 5 × 5 km or finer resolution), thereby converting low-resolution data into a high-resolution distribution and indices of relative density. In our demonstrative study, we disaggregated areal count data from hunting bag returns to disentangle the changing distribution and population dynamics of three deer species (red, sika, and fallow) in Ireland from 2000 to 2018. We show an application of the Bayesian areal disaggregation regression method and document marked increases in relative population density and extensive range expansion for each of the three deer species across Ireland. We challenged our disaggregated model predictions by correlating them with independent deer surveys carried out in field sites and alternative deer distribution models built using presence-only and presence-absence data. Finding a high correlation with both independent data sets, we highlighted the ability of Bayesian areal disaggregation regression to accurately capture fine-scale spatial patterns of animal distribution. This study uncovers new scenarios for wildlife managers and conservation biologists to reliably use regional count data disregarded so far in species distribution modeling. Thus, it represents a step forward in our ability to monitor wildlife population and meet challenges in our changing world.

Subject(s)

Animals, Wild , Deer , Animals , Humans , Bayes Theorem , Specific Gravity , Population Dynamics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL