Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Can J Stat ; 51(2): 355-374, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37346757

RESUMO

Consider the setting where (i) individual-level data are collected to build a regression model for the association between an event of interest and certain covariates, and (ii) some risk calculators predicting the risk of the event using less detailed covariates are available, possibly as algorithmic black boxes with little information available about how they were built. We propose a general empirical-likelihood-based framework to integrate the rich auxiliary information contained in the calculators into fitting the regression model, to make the estimation of regression parameters more efficient. Two methods are developed, one using working models to extract the calculator information and one making a direct use of calculator predictions without working models. Theoretical and numerical investigations show that the calculator information can substantially reduce the variance of regression parameter estimation. As an application, we study the dependence of the risk of high grade prostate cancer on both conventional risk factors and newly identified molecular biomarkers by integrating information from the Prostate Biopsy Collaborative Group (PBCG) risk calculator, which was built based on conventional risk factors alone.


Insérer votre résumé ici. We will supply a French abstract for those authors who can't prepare it themselves.

2.
Ann Stat ; 49(2): 793-819, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35510045

RESUMO

In Learn-As-you-GO (LAGO) adaptive studies, the intervention is a complex multicomponent package, and is adapted in stages during the study based on past outcome data. This design formalizes standard practice in public health intervention studies. An effective intervention package is sought, while minimizing intervention package cost. In LAGO study data, the interventions in later stages depend upon the outcomes in the previous stages, violating standard statistical theory. We develop an estimator for the intervention effects, and prove consistency and asymptotic normality using a novel coupling argument, ensuring the validity of the test for the hypothesis of no overall intervention effect. We develop a confidence set for the optimal intervention package and confidence bands for the success probabilities under alternative package compositions. We illustrate our methods in the BetterBirth Study, which aimed to improve maternal and neonatal outcomes among 157,689 births in Uttar Pradesh, India through a multicomponent intervention package.

3.
J Stat Comput Simul ; 88(3): 575-596, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29731525

RESUMO

We develop a Bayesian variable selection method for logistic regression models that can simultaneously accommodate qualitative covariates and interaction terms under various heredity constraints. We use expectation-maximization variable selection (EMVS) with a deterministic annealing variant as the platform for our method, due to its proven flexibility and efficiency. We propose a variance adjustment of the priors for the coefficients of qualitative covariates, which controls false-positive rates, and a flexible parameterization for interaction terms, which accommodates user-specified heredity constraints. This method can handle all pairwise interaction terms as well as a subset of specific interactions. Using simulation, we show that this method selects associated covariates better than the grouped LASSO and the LASSO with heredity constraints in various exploratory research scenarios encountered in epidemiological studies. We apply our method to identify genetic and non-genetic risk factors associated with smoking experimentation in a cohort of Mexican-heritage adolescents.

4.
J Appl Stat ; 51(8): 1497-1523, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38863802

RESUMO

Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here 'main traits') of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or 'phenotypes') that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.

5.
Mathematics (Basel) ; 12(2)2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38773986

RESUMO

Epidemiological studies often encounter a challenge due to exposure measurement error when estimating an exposure-disease association. A surrogate variable may be available for the true unobserved exposure variable. However, zero-inflated data are encountered frequently in the surrogate variables. For example, many nutrient or physical activity measures may have a zero value (or a low detectable value) among a group of individuals. In this paper, we investigate regression analysis when the observed surrogates may have zero values among some individuals of the whole study cohort. A naive regression calibration without taking into account a probability mass of the surrogate variable at 0 (or a low detectable value) will be biased. We developed a regression calibration estimator which typically can have smaller biases than the naive regression calibration estimator. We propose an expected estimating equation estimator which is consistent under the zero-inflated surrogate regression model. Extensive simulations show that the proposed estimator performs well in terms of bias correction. These methods are applied to a physical activity intervention study.

6.
J Appl Stat ; 51(7): 1399-1411, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38835824

RESUMO

The Hosmer-Lemeshow (HL) test is a commonly used global goodness-of-fit (GOF) test that assesses the quality of the overall fit of a logistic regression model. In this paper, we give results from simulations showing that the type I error rate (and hence power) of the HL test decreases as model complexity grows, provided that the sample size remains fixed and binary replicates (multiple Bernoulli trials) are present in the data. We demonstrate that a generalized version of the HL test (GHL) presented in previous work can offer some protection against this power loss. These results are also supported by application of both the HL and GHL test to a real-life data set. We conclude with a brief discussion explaining the behavior of the HL test, along with some guidance on how to choose between the two tests. In particular, we suggest the GHL test to be used when there are binary replicates or clusters in the covariate space, provided that the sample size is sufficiently large.

7.
J Appl Stat ; 51(5): 866-890, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38524798

RESUMO

Despite the vast advantages of making antenatal care visits, the service utilization among pregnant women in Nigeria is suboptimal. A five-year monitoring estimate indicated that about 24% of the women who had live births made no visit. The non-utilization induced excessive zeroes in the outcome of interest. Thus, this study adopted a zero-inflated negative binomial model within a Bayesian framework to identify the spatial pattern and the key factors hindering antenatal care utilization in Nigeria. We overcome the intractability associated with posterior inference by adopting a Pólya-Gamma data-augmentation technique to facilitate inference. The Gibbs sampling algorithm was used to draw samples from the joint posterior distribution. Results revealed that type of place of residence, maternal level of education, access to mass media, household work index, and woman's working status have significant effects on the use of antenatal care services. Findings identified substantial state-level spatial disparity in antenatal care utilization across the country. Cost-effective techniques to achieve an acceptable frequency of utilization include the creation of a community-specific awareness to emphasize the importance and benefits of the appropriate utilization. Special consideration should be given to older pregnant women, women in poor antenatal utilization states, and women residing in poor road network regions.

8.
Heliyon ; 9(12): e23063, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38058455

RESUMO

Introduction: This article aims to determine the effectiveness and extent of measures taken to decrease the severity of traffic crashes in Barcelona from 2013 to 2018. This will be achieved through an analysis of the traffic crash data. Method: Our approach involves the use of binary logistic regression models. We rely on the traffic crash dataset from 2010-2019 available in the Open Data Barcelona platform. Results: The outcomes obtained from the suggested models are contrasted with the strategies outlined in the Local Road Safety Plan 2013-2018 to minimize the severity of crashes. Effective preventive actions were identified, such as road safety educational programs, creating calm zones, enhancing pedestrian crossings, or expanding bicycle lanes. However, certain measures were found to be ineffective or their impact remained uncertain. Conclusions: Our findings indicate that the measures implemented in Barcelona may have participated in and influenced the decrease in the severity of traffic incidents over the past decade. Notably, fatalities have decreased more than severe injuries. More attention should be given to less effective measures such as speed controls and drug/alcohol testing.

9.
Bayesian Anal ; -1(-1): 1-36, 2023 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-36714467

RESUMO

Geographically weighted regression (GWR) models handle geographical dependence through a spatially varying coefficient model and have been widely used in applied science, but its general Bayesian extension is unclear because it involves a weighted log-likelihood which does not imply a probability distribution on data. We present a Bayesian GWR model and show that its essence is dealing with partial misspecification of the model. Current modularized Bayesian inference models accommodate partial misspecification from a single component of the model. We extend these models to handle partial misspecification in more than one component of the model, as required for our Bayesian GWR model. Information from the various spatial locations is manipulated via a geographically weighted kernel and the optimal manipulation is chosen according to a Kullback-Leibler (KL) divergence. We justify the model via an information risk minimization approach and show the consistency of the proposed estimator in terms of a geographically weighted KL divergence.

10.
J Appl Stat ; 49(11): 2845-2869, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36093035

RESUMO

When the observed proportion of zeros in a data set consisting of binary outcome data is larger than expected under a regular logistic regression model, it is frequently suggested to use a zero-inflated Bernoulli (ZIB) regression model. A spline-based ZIB regression model is proposed to describe the potentially nonlinear effect of a continuous covariate. A spline is used to approximate the unknown smooth function. Under the smoothness condition, the spline estimator of the unknown smooth function is uniformly consistent, and the regression parameter estimators are asymptotically normally distributed. We propose an easily implemented and consistent estimation method for the variances of the regression parameter estimators. Extensive simulations are conducted to investigate the finite-sample performance of the proposed method. A real-life data set is used to illustrate the practical use of the proposed methodology. The real-life data analysis indicates that the prediction performance of the proposed semiparametric ZIB regression model is better compared to the parametric ZIB regression model.

11.
J Appl Stat ; 49(1): 143-168, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35707801

RESUMO

Under a unit-level bivariate linear mixed model, this paper introduces small area predictors of expenditure means and ratios, and derives approximations and estimators of the corresponding mean squared errors. For the considered model, the REML estimation method is implemented. Several simulation experiments, designed to analyze the behavior of the introduced fitting algorithm, predictors and mean squared error estimators, are carried out. An application to real data from the Spanish household budget survey illustrates the behavior of the proposed statistical methodology. The target is the estimation of means of food and non-food household annual expenditures and of ratios of food household expenditures by Spanish provinces.

12.
J Appl Stat ; 48(4): 669-692, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35706991

RESUMO

It is a major research topic of limited generalized linear models, namely, generalized linear models with limited dependent variables. The models are developed in many research fields. However, quasi-likelihood estimation of the models is an unresolved issue, due to including limited dependent variables. We propose a novel quasi-likelihood, called Taylor quasi-likelihood, to handle with the unified estimation problem of the limited models. It is based on Taylor expansion of distribution function or likelihood function. We also extend the likelihood to a generalized version and an adaptive version and propose a distributed procedure to obtain the likelihood estimator. In low-dimensional setting, we give selection criteria for the proposed method and make arguments for the consistency and asymptotic normality of the estimator. In high-dimensional setting, we discuss feature selection and oracle properties of the proposed method. Simulation results confirm the advantages of the proposed method.

13.
J Appl Stat ; 48(13-15): 2864-2888, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35707079

RESUMO

The logit binomial logistic dose response model is commonly used in applied research to model binary outcomes as a function of the dose or concentration of a substance. This model is easily tailored to assess the relative potency of two substances. Consequently, in instances where two such dose response curves are parallel so one substance can be viewed as a dilution of the other, the degree of that dilution is captured in the relative potency model parameter. It is incumbent that experimental researchers working in fields including biomedicine, environmental science, toxicology and applied sciences choose efficient experimental designs to run their studies to both fit their dose response curves and to garner important information regarding drug or substance potency. This article provides far-reaching practical design strategies for dose response model fitting and estimation of relative potency using key illustrations. These results are subsequently extended here to handle situations where the assessment of parallelism and the proper dose-scale are also of interest. Conclusions and recommended strategies are supported by both theoretical and simulation results.

14.
J Appl Stat ; 48(5): 765-785, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35707447

RESUMO

Using a multivariate latent variable approach, this article proposes some new general models to analyze the correlated bounded continuous and categorical (nominal or/and ordinal) responses with and without non-ignorable missing values. First, we discuss regression methods for jointly analyzing continuous, nominal, and ordinal responses that we motivated by analyzing data from studies of toxicity development. Second, using the beta and Dirichlet distributions, we extend the models so that some bounded continuous responses are replaced for continuous responses. The joint distribution of the bounded continuous, nominal and ordinal variables is decomposed into a marginal multinomial distribution for the nominal variable and a conditional multivariate joint distribution for the bounded continuous and ordinal variables given the nominal variable. We estimate the regression parameters under the new general location models using the maximum-likelihood method. Sensitivity analysis is also performed to study the influence of small perturbations of the parameters of the missing mechanisms of the model on the maximal normal curvature. The proposed models are applied to two data sets: BMI, Steatosis and Osteoporosis data and Tehran household expenditure budgets.

15.
J Appl Stat ; 47(13-15): 2641-2657, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-35707435

RESUMO

When applying analysis of variance, the sample sizes may not be previously known, so it is more appropriate to consider them as realizations of random variables. A motivating example is the collection of observations during a fixed time span in a study comparing, for example, several pathologies of patients arriving at a hospital. This paper extends the theory of analysis of variance to those situations considering mixed effects models. We will assume that the occurrences of observations correspond to a counting process and the sample dimensions have Poisson distribution. The proposed approach is applied to a study of cancer patients.

16.
J Appl Stat ; 47(13-15): 2879-2894, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-35707418

RESUMO

In this study, a logistic regression model is applied to credit scoring data from a given Portuguese financial institution to evaluate the default risk of consumer loans. It was found that the risk of default increases with the loan spread, loan term and age of the customer, but decreases if the customer owns more credit cards. Clients receiving the salary in the same banking institution of the loan have less chances of default than clients receiving their salary in another institution. We also found that clients in the lowest income tax echelon have more propensity to default. The model predicted default correctly in 89.79% of the cases.

17.
Commun Stat Simul Comput ; 47(6): 1722-1738, 2018 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-30555205

RESUMO

Overdispersion is a problem encountered in the analysis of count data that can lead to invalid inference if unaddressed. Decision about whether data are overdispersed is often reached by checking whether the ratio of the Pearson chi-square statistic to its degrees of freedom is greater than one; however, there is currently no fixed threshold for declaring the need for statistical intervention. We consider simulated cross-sectional and longitudinal datasets containing varying magnitudes of overdispersion caused by outliers or zero inflation, as well as real datasets, to determine an appropriate threshold value of this statistic which indicates when overdispersion should be addressed.

18.
Stat Interface ; 9(2): 147-158, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-34221214

RESUMO

The case-control design provides an effective way to collect covariate information conditioning on subjects' disease status. The standard logistic regression model can be used to model the interaction between two covariates under such a design, but the prospective logistic regression method might not be the most efficient one when certain appropriate constraints can be imposed on the covariate distribution. We develop a hybrid approach for the statistical inference of the interaction under the case-control design. We use a parametric model to characterize the conditional distribution of one covariate given the another covariate in the control population, while leaving the distribution of the later covariate to be fully nonparametric. A maximum hybrid parametric and empirical likelihood method is adopted for the evaluation of all parameters. The estimator and the associated test derived from the proposed semiparametric model are suitable for evaluating the interaction between two covariates of various types (discrete or continuous). Asymptotic results for both the estimators and the test statistics were established, and the advantages of the proposed method over the existing ones are demonstrated through simulation results and a real data example.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa