Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
Biometrika ; 111(3): 971-988, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39239267

RESUMO

Interval-censored multistate data arise in many studies of chronic diseases, where the health status of a subject can be characterized by a finite number of disease states and the transition between any two states is only known to occur over a broad time interval. We relate potentially time-dependent covariates to multistate processes through semiparametric proportional intensity models with random effects. We study nonparametric maximum likelihood estimation under general interval censoring and develop a stable expectation-maximization algorithm. We show that the resulting parameter estimators are consistent and that the finite-dimensional components are asymptotically normal with a covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we demonstrate through extensive simulation studies that the proposed numerical and inferential procedures perform well in realistic settings. Finally, we provide an application to a major epidemiologic cohort study.

2.
Stat Theory Relat Fields ; 8(1): 69-79, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39206429

RESUMO

Causal inference plays a crucial role in biomedical studies and social sciences. Over the years, researchers have devised various methods to facilitate causal inference, particularly in observational studies. Among these methods, the doubly robust estimator distinguishes itself through a remarkable feature: it retains its consistency even when only one of the two components-either the propensity score model or the outcome mean model-is correctly specified, rather than demanding correctness in both simultaneously. In this paper, we focus on scenarios where semiparametric models are employed for both the propensity score and the outcome mean. Semiparametric models offer a valuable blend of interpretability akin to parametric models and the adaptability characteristic of nonparametric models. In this context, achieving correct model specification involves both accurately specifying the unknown function and consistently estimating the unknown parameter. We introduce a novel concept: the relaxed doubly robust estimator. It operates in a manner reminiscent of the traditional doubly robust estimator but with a reduced requirement for double robustness. In essence, it only mandates the consistent estimate of the unknown parameter, without requiring the correct specification of the unknown function. This means that it only necessitates a partially correct model specification. We conduct a thorough analysis to establish the double robustness and semiparametric efficiency of our proposed estimator. Furthermore, we bolster our findings with comprehensive simulation studies to illustrate the practical implications of our approach.

3.
Biometrics ; 80(2)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38768225

RESUMO

Conventional supervised learning usually operates under the premise that data are collected from the same underlying population. However, challenges may arise when integrating new data from different populations, resulting in a phenomenon known as dataset shift. This paper focuses on prior probability shift, where the distribution of the outcome varies across datasets but the conditional distribution of features given the outcome remains the same. To tackle the challenges posed by such shift, we propose an estimation algorithm that can efficiently combine information from multiple sources. Unlike existing methods that are restricted to discrete outcomes, the proposed approach accommodates both discrete and continuous outcomes. It also handles high-dimensional covariate vectors through variable selection using an adaptive least absolute shrinkage and selection operator penalty, producing efficient estimates that possess the oracle property. Moreover, a novel semiparametric likelihood ratio test is proposed to check the validity of prior probability shift assumptions by embedding the null conditional density function into Neyman's smooth alternatives (Neyman, 1937) and testing study-specific parameters. We demonstrate the effectiveness of our proposed method through extensive simulations and a real data example. The proposed methods serve as a useful addition to the repertoire of tools for dealing dataset shifts.


Assuntos
Algoritmos , Simulação por Computador , Modelos Estatísticos , Probabilidade , Humanos , Funções Verossimilhança , Biometria/métodos , Interpretação Estatística de Dados , Aprendizado de Máquina Supervisionado
4.
Biom J ; 66(4): e2300113, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38801216

RESUMO

In observational studies, instrumental variable (IV) methods are commonly applied when there are unmeasured covariates. In Mendelian randomization, constructing an allele score using many single nucleotide polymorphisms is often implemented; however, estimating biased causal effects by including some invalid IVs poses some risks. Invalid IVs are those IV candidates that are associated with unobserved variables. To solve this problem, we developed a novel strategy using negative control outcomes (NCOs) as auxiliary variables. Using NCOs, we are able to select only valid IVs and exclude invalid IVs without knowing which of the instruments are invalid. We also developed a new two-step estimation procedure and proved the semiparametric efficiency of our estimator. The performance of our proposed method was superior to some previous methods through simulations. Subsequently, we applied the proposed method to the UK Biobank dataset. Our results demonstrate that the use of an auxiliary variable, such as an NCO, enables the selection of valid IVs with assumptions different from those used in previous methods.


Assuntos
Biometria , Humanos , Biometria/métodos , Polimorfismo de Nucleotídeo Único , Análise da Randomização Mendeliana/métodos
5.
Biometrics ; 80(2)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38563532

RESUMO

Deep learning has continuously attained huge success in diverse fields, while its application to survival data analysis remains limited and deserves further exploration. For the analysis of current status data, a deep partially linear Cox model is proposed to circumvent the curse of dimensionality. Modeling flexibility is attained by using deep neural networks (DNNs) to accommodate nonlinear covariate effects and monotone splines to approximate the baseline cumulative hazard function. We establish the convergence rate of the proposed maximum likelihood estimators. Moreover, we derive that the finite-dimensional estimator for treatment covariate effects is $\sqrt{n}$-consistent, asymptotically normal, and attains semiparametric efficiency. Finally, we demonstrate the performance of our procedures through extensive simulation studies and application to real-world data on news popularity.


Assuntos
Modelos de Riscos Proporcionais , Funções Verossimilhança , Análise de Sobrevida , Simulação por Computador , Modelos Lineares
6.
Biometrics ; 80(1)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38364804

RESUMO

Researchers interested in understanding the relationship between a readily available longitudinal binary outcome and a novel biomarker exposure can be confronted with ascertainment costs that limit sample size. In such settings, two-phase studies can be cost-effective solutions that allow researchers to target informative individuals for exposure ascertainment and increase estimation precision for time-varying and/or time-fixed exposure coefficients. In this paper, we introduce a novel class of residual-dependent sampling (RDS) designs that select informative individuals using data available on the longitudinal outcome and inexpensive covariates. Together with the RDS designs, we propose a semiparametric analysis approach that efficiently uses all data to estimate the parameters. We describe a numerically stable and computationally efficient EM algorithm to maximize the semiparametric likelihood. We examine the finite sample operating characteristics of the proposed approaches through extensive simulation studies, and compare the efficiency of our designs and analysis approach with existing ones. We illustrate the usefulness of the proposed RDS designs and analysis method in practice by studying the association between a genetic marker and poor lung function among patients enrolled in the Lung Health Study (Connett et al, 1993).


Assuntos
Modelos Estatísticos , Humanos , Simulação por Computador , Tamanho da Amostra , Probabilidade , Interpretação Estatística de Dados , Estudos de Amostragem , Estudos Longitudinais
7.
J R Stat Soc Series B Stat Methodol ; 85(3): 575-596, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37521165

RESUMO

We propose a test-based elastic integrative analysis of the randomised trial and real-world data to estimate treatment effect heterogeneity with a vector of known effect modifiers. When the real-world data are not subject to bias, our approach combines the trial and real-world data for efficient estimation. Utilising the trial design, we construct a test to decide whether or not to use real-world data. We characterise the asymptotic distribution of the test-based estimator under local alternatives. We provide a data-adaptive procedure to select the test threshold that promises the smallest mean square error and an elastic confidence interval with a good finite-sample coverage property.

8.
Commun Stat Theory Methods ; 52(16): 5767-5798, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37484707

RESUMO

When effect modifiers influence the decision to participate in randomized trials, generalizing causal effect estimates to an external target population requires the knowledge of two scores - the propensity score for receiving treatment and the sampling score for trial participation. While the former score is known due to randomization, the latter score is usually unknown and estimated from data. Under unconfounded trial participation, we characterize the asymptotic efficiency bounds for estimating two causal estimands - the population average treatment effect and the average treatment effect among the non-participants - and examine the role of the scores. We also study semiparametric efficient estimators that directly balance the weighted trial sample toward the target population, and illustrate their operating characteristics via simulations.

9.
Stat Sin ; 33(2): 685-704, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37234206

RESUMO

In this paper, we consider a class of partially linear transformation models with interval-censored competing risks data. Under a semiparametric generalized odds rate specification for the cause-specific cumulative incidence function, we obtain optimal estimators of the large number of parametric and nonparametric model components via maximizing the likelihood function over a joint B-spline and Bernstein polynomial spanned sieve space. Our specification considers a relatively simpler finite-dimensional parameter space, approximating the infinite-dimensional parameter space as n → ∞, thereby allowing us to study the almost sure consistency, and rate of convergence for all parameters, and the asymptotic distributions and efficiency of the finite-dimensional components. We study the finite sample performance of our method through simulation studies under a variety of scenarios. Furthermore, we illustrate our methodology via application to a dataset on HIV-infected individuals from sub-Saharan Africa.

10.
Biometrics ; 79(4): 3038-3049, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-36988158

RESUMO

This work considers targeted maximum likelihood estimation (TMLE) of treatment effects on absolute risk and survival probabilities in classical time-to-event settings characterized by right-censoring and competing risks. TMLE is a general methodology combining flexible ensemble learning and semiparametric efficiency theory in a two-step procedure for substitution estimation of causal parameters. We specialize and extend the continuous-time TMLE methods for competing risks settings, proposing a targeting algorithm that iteratively updates cause-specific hazards to solve the efficient influence curve equation for the target parameter. As part of the work, we further detail and implement the recently proposed highly adaptive lasso estimator for continuous-time conditional hazards with L1 -penalized Poisson regression. The resulting estimation procedure benefits from relying solely on very mild nonparametric restrictions on the statistical model, thus providing a novel tool for machine-learning-based semiparametric causal inference for continuous-time time-to-event data. We apply the methods to a publicly available dataset on follicular cell lymphoma where subjects are followed over time until disease relapse or death without relapse. The data display important time-varying effects that can be captured by the highly adaptive lasso. In our simulations that are designed to imitate the data, we compare our methods to a similar approach based on random survival forests and to the discrete-time TMLE.


Assuntos
Algoritmos , Modelos Estatísticos , Humanos , Funções Verossimilhança , Aprendizado de Máquina , Recidiva
11.
Biometrics ; 79(4): 2920-2932, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-36645310

RESUMO

When there are resource constraints, it may be necessary to rank individualized treatment benefits to facilitate the prioritization of assigning different treatments. Most existing literature on individualized treatment rules targets absolute conditional treatment effect differences as a metric for the benefit. However, there can be settings where relative differences may better represent such benefit. In this paper, we consider modeling such relative differences formed as scale-invariant contrasts between the conditional treatment effects. By showing that all scale-invariant contrasts are monotonic transformations of each other, we posit a single index model for a particular relative contrast. We then characterize semiparametric estimating equations, including the efficient score, to estimate index parameters. To achieve semiparametric efficiency, we propose a two-step approach that minimizes a doubly robust loss function for initial estimation and then performs a one-step efficiency augmentation procedure. Careful theoretical and numerical studies are provided to show the superiority of our proposed approach.


Assuntos
Modelos Estatísticos , Medicina de Precisão , Medicina de Precisão/métodos
12.
Stat Med ; 2023 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-36597179

RESUMO

We consider the conditional treatment effect for competing risks data in observational studies. We derive the efficient score for the treatment effect using modern semiparametric theory, as well as two doubly robust scores with respect to (1) the assumed propensity score for treatment and the censoring model, and (2) the outcome models for the competing risks. An important property regarding the estimators is rate double robustness, in addition to the classical model double robustness. Rate double robustness enables the use of machine learning and nonparametric methods in order to estimate the nuisance parameters, while preserving the root- n $$ n $$ asymptotic normality of the estimated treatment effect for inferential purposes. We study the performance of the estimators using simulation. The estimators are applied to the data from a cohort of Japanese men in Hawaii followed since 1960s in order to study the effect of mid-life drinking behavior on late life cognitive outcomes. The approaches developed in this article are implemented in the R package "HazardDiff".

13.
Biometrics ; 79(3): 1686-1700, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-36314379

RESUMO

Owing to its robustness properties, marginal interpretations, and ease of implementation, the pseudo-partial likelihood method proposed in the seminal papers of Pepe and Cai and Lin et al. has become the default approach for analyzing recurrent event data with Cox-type proportional rate models. However, the construction of the pseudo-partial score function ignores the dependency among recurrent events and thus can be inefficient. An attempt to investigate the asymptotic efficiency of weighted pseudo-partial likelihood estimation found that the optimal weight function involves the unknown variance-covariance process of the recurrent event process and may not have closed-form expression. Thus, instead of deriving the optimal weights, we propose to combine a system of pre-specified weighted pseudo-partial score equations via the generalized method of moments and empirical likelihood estimation. We show that a substantial efficiency gain can be easily achieved without imposing additional model assumptions. More importantly, the proposed estimation procedures can be implemented with existing software. Theoretical and numerical analyses show that the empirical likelihood estimator is more appealing than the generalized method of moments estimator when the sample size is sufficiently large. An analysis of readmission risk in colorectal cancer patients is presented to illustrate the proposed methodology.


Assuntos
Software , Humanos , Simulação por Computador , Modelos de Riscos Proporcionais , Probabilidade , Tamanho da Amostra
14.
Biostatistics ; 24(3): 686-707, 2023 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-35102366

RESUMO

Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary exposures and static interventions and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by exposure. We present a theoretical study of an (in)direct effect decomposition of the population intervention effect, defined by stochastic interventions jointly applied to the exposure and mediators. In contrast to existing proposals, our causal effects can be evaluated regardless of whether an exposure is categorical or continuous and remain well-defined even in the presence of intermediate confounders affected by exposure. Our (in)direct effects are identifiable without a restrictive assumption on cross-world counterfactual independencies, allowing for substantive conclusions drawn from them to be validated in randomized controlled trials. Beyond the novel effects introduced, we provide a careful study of nonparametric efficiency theory relevant for the construction of flexible, multiply robust estimators of our (in)direct effects, while avoiding undue restrictions induced by assuming parametric models of nuisance parameter functionals. To complement our nonparametric estimation strategy, we introduce inferential techniques for constructing confidence intervals and hypothesis tests, and discuss open-source software, the $\texttt{medshift}$$\texttt{R}$ package, implementing the proposed methodology. Application of our (in)direct effects and their nonparametric estimators is illustrated using data from a comparative effectiveness trial examining the direct and indirect effects of pharmacological therapeutics on relapse to opioid use disorder.


Assuntos
Análise de Mediação , Modelos Estatísticos , Humanos , Modelos Teóricos , Causalidade
15.
Biometrics ; 79(2): 1029-1041, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-35839293

RESUMO

Inverse-probability-weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudopopulation in which selection biases are eliminated. Despite their ease of use, these estimators require the correct specification of a model for the weighting mechanism, are known to be inefficient, and suffer from the curse of dimensionality. We propose a class of nonparametric inverse-probability-weighted estimators in which the weighting mechanism is estimated via undersmoothing of the highly adaptive lasso, a nonparametric regression function proven to converge at nearly n - 1 / 3 $ n^{-1/3}$ -rate to the true weighting mechanism. We demonstrate that our estimators are asymptotically linear with variance converging to the nonparametric efficiency bound. Unlike doubly robust estimators, our procedures require neither derivation of the efficient influence function nor specification of the conditional outcome model. Our theoretical developments have broad implications for the construction of efficient inverse-probability-weighted estimators in large statistical models and a variety of problem settings. We assess the practical performance of our estimators in simulation studies and demonstrate use of our proposed methodology with data from a large-scale epidemiologic study.


Assuntos
Modelos Estatísticos , Probabilidade , Simulação por Computador , Viés de Seleção , Causalidade
16.
Biometrics ; 79(2): 1213-1225, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-34862966

RESUMO

Complementary features of randomized controlled trials (RCTs) and observational studies (OSs) can be used jointly to estimate the average treatment effect of a target population. We propose a calibration weighting estimator that enforces the covariate balance between the RCT and OS, therefore improving the trial-based estimator's generalizability. Exploiting semiparametric efficiency theory, we propose a doubly robust augmented calibration weighting estimator that achieves the efficiency bound derived under the identification assumptions. A nonparametric sieve method is provided as an alternative to the parametric approach, which enables the robust approximation of the nuisance functions and data-adaptive selection of outcome predictors for calibration. We establish asymptotic results and confirm the finite sample performances of the proposed estimators by simulation experiments and an application on the estimation of the treatment effect of adjuvant chemotherapy for early-stage non-small-cell lung patients after surgery.


Assuntos
Modelos Estatísticos , Humanos , Simulação por Computador
17.
Ann Stat ; 51(5): 1965-1988, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38405375

RESUMO

This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide predictions of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation bias unless the selection is taken into account. The new approach involves the construction of semi-parametrically efficient estimators of the linear association between the predictors and the survival outcome, which are used to build a test statistic for detecting the presence of an association between any of the predictors and the outcome. Further, a stabilization technique reminiscent of bagging allows a normal calibration for the resulting test statistic, which enables the construction of confidence intervals for the maximal association between predictors and the outcome and also greatly reduces computational cost. Theoretical results show that this testing procedure is valid even when the number of predictors grows superpolynomially with sample size, and our simulations support this asymptotic guarantee at moderate sample sizes. The new approach is applied to the problem of identifying patterns in viral gene expression associated with the potency of an antiviral drug.

18.
Lifetime Data Anal ; 2022 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-36336732

RESUMO

Targeted maximum likelihood estimation (TMLE) provides a general methodology for estimation of causal parameters in presence of high-dimensional nuisance parameters. Generally, TMLE consists of a two-step procedure that combines data-adaptive nuisance parameter estimation with semiparametric efficiency and rigorous statistical inference obtained via a targeted update step. In this paper, we demonstrate the practical applicability of TMLE based causal inference in survival and competing risks settings where event times are not confined to take place on a discrete and finite grid. We focus on estimation of causal effects of time-fixed treatment decisions on survival and absolute risk probabilities, considering different univariate and multidimensional parameters. Besides providing a general guidance to using TMLE for survival and competing risks analysis, we further describe how the previous work can be extended with the use of loss-based cross-validated estimation, also known as super learning, of the conditional hazards. We illustrate the usage of the considered methods using publicly available data from a trial on adjuvant chemotherapy for colon cancer. R software code to implement all considered algorithms and to reproduce all analyses is available in an accompanying online appendix on Github.

19.
J Causal Inference ; 10(1): 415-440, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37637433

RESUMO

In the presence of heterogeneity between the randomized controlled trial (RCT) participants and the target population, evaluating the treatment effect solely based on the RCT often leads to biased quantification of the real-world treatment effect. To address the problem of lack of generalizability for the treatment effect estimated by the RCT sample, we leverage observational studies with large samples that are representative of the target population. This article concerns evaluating treatment effects on survival outcomes for a target population and considers a broad class of estimands that are functionals of treatment-specific survival functions, including differences in survival probability and restricted mean survival times. Motivated by two intuitive but distinct approaches, i.e., imputation based on survival outcome regression and weighting based on inverse probability of sampling, censoring, and treatment assignment, we propose a semiparametric estimator through the guidance of the efficient influence function. The proposed estimator is doubly robust in the sense that it is consistent for the target population estimands if either the survival model or the weighting model is correctly specified and is locally efficient when both are correct. In addition, as an alternative to parametric estimation, we employ the nonparametric method of sieves for flexible and robust estimation of the nuisance functions and show that the resulting estimator retains the root-n consistency and efficiency, the so-called rate-double robustness. Simulation studies confirm the theoretical properties of the proposed estimator and show that it outperforms competitors. We apply the proposed method to estimate the effect of adjuvant chemotherapy on survival in patients with early-stage resected non-small cell lung cancer.

20.
Biometrics ; 78(4): 1674-1685, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34213008

RESUMO

Persons living with HIV engage in routine clinical care, generating large amounts of data in observational HIV cohorts. These data are often error-prone, and directly using them in biomedical research could bias estimation and give misleading results. A cost-effective solution is the two-phase design, under which the error-prone variables are observed for all patients during Phase I, and that information is used to select patients for data auditing during Phase II. For example, the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet) selected a random sample from each site for data auditing. Herein, we consider efficient odds ratio estimation with partially audited, error-prone data. We propose a semiparametric approach that uses all information from both phases and accommodates a number of error mechanisms. We allow both the outcome and covariates to be error-prone and these errors to be correlated, and selection of the Phase II sample can depend on Phase I data in an arbitrary manner. We devise a computationally efficient, numerically stable EM algorithm to obtain estimators that are consistent, asymptotically normal, and asymptotically efficient. We demonstrate the advantages of the proposed methods over existing ones through extensive simulations. Finally, we provide applications to the CCASAnet cohort.


Assuntos
Infecções por HIV , Projetos de Pesquisa , Humanos , Razão de Chances , Viés , Interpretação Estatística de Dados , Infecções por HIV/epidemiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA