Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Asunto de la revista
Intervalo de año de publicación
1.
Biostatistics ; 24(3): 760-775, 2023 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-35166342

RESUMEN

Leveraging large-scale electronic health record (EHR) data to estimate survival curves for clinical events can enable more powerful risk estimation and comparative effectiveness research. However, use of EHR data is hindered by a lack of direct event time observations. Occurrence times of relevant diagnostic codes or target disease mentions in clinical notes are at best a good approximation of the true disease onset time. On the other hand, extracting precise information on the exact event time requires laborious manual chart review and is sometimes altogether infeasible due to a lack of detailed documentation. Current status labels-binary indicators of phenotype status during follow-up-are significantly more efficient and feasible to compile, enabling more precise survival curve estimation given limited resources. Existing survival analysis methods using current status labels focus almost entirely on supervised estimation, and naive incorporation of unlabeled data into these methods may lead to biased estimates. In this article, we propose Semisupervised Calibration of Risk with Noisy Event Times (SCORNET), which yields a consistent and efficient survival function estimator by leveraging a small set of current status labels and a large set of informative features. In addition to providing theoretical justification of SCORNET, we demonstrate in both simulation and real-world EHR settings that SCORNET achieves efficiency akin to the parametric Weibull regression model, while also exhibiting semi-nonparametric flexibility and relatively low empirical bias in a variety of generative settings.


Asunto(s)
Registros Electrónicos de Salud , Humanos , Calibración , Sesgo , Simulación por Computador
2.
Biometrics ; 80(2)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38563532

RESUMEN

Deep learning has continuously attained huge success in diverse fields, while its application to survival data analysis remains limited and deserves further exploration. For the analysis of current status data, a deep partially linear Cox model is proposed to circumvent the curse of dimensionality. Modeling flexibility is attained by using deep neural networks (DNNs) to accommodate nonlinear covariate effects and monotone splines to approximate the baseline cumulative hazard function. We establish the convergence rate of the proposed maximum likelihood estimators. Moreover, we derive that the finite-dimensional estimator for treatment covariate effects is $\sqrt{n}$-consistent, asymptotically normal, and attains semiparametric efficiency. Finally, we demonstrate the performance of our procedures through extensive simulation studies and application to real-world data on news popularity.


Asunto(s)
Modelos de Riesgos Proporcionales , Funciones de Verosimilitud , Análisis de Supervivencia , Simulación por Computador , Modelos Lineales
3.
Stat Med ; 43(9): 1726-1742, 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38381059

RESUMEN

Current status data are a type of failure time data that arise when the failure time of study subject cannot be determined precisely but is known only to occur before or after a random monitoring time. Variable selection methods for the failure time data have been discussed extensively in the literature. However, the statistical inference of the model selected based on the variable selection method ignores the uncertainty caused by model selection. To enhance the prediction accuracy for risk quantities such as survival probability, we propose two optimal model averaging methods under semiparametric additive hazards models. Specifically, based on martingale residuals processes, a delete-one cross-validation (CV) process is defined, and two new CV functional criteria are derived for choosing model weights. Furthermore, we present a greedy algorithm for the implementation of the techniques, and the asymptotic optimality of the proposed model averaging approaches is established, along with the convergence of the greedy averaging algorithms. A series of simulation experiments demonstrate the effectiveness and superiority of the proposed methods. Finally, a real-data example is provided as an illustration.


Asunto(s)
Algoritmos , Modelos Estadísticos , Humanos , Modelos de Riesgos Proporcionales , Simulación por Computador , Probabilidad
4.
J Biomed Inform ; 157: 104685, 2024 Jul 14.
Artículo en Inglés | MEDLINE | ID: mdl-39004109

RESUMEN

BACKGROUND: Risk prediction plays a crucial role in planning for prevention, monitoring, and treatment. Electronic Health Records (EHRs) offer an expansive repository of temporal medical data encompassing both risk factors and outcome indicators essential for effective risk prediction. However, challenges emerge due to the lack of readily available gold-standard outcomes and the complex effects of various risk factors. Compounding these challenges are the false positives in diagnosis codes, and formidable task of pinpointing the onset timing in annotations. OBJECTIVE: We develop a Semi-supervised Double Deep Learning Temporal Risk Prediction (SeDDLeR) algorithm based on extensive unlabeled longitudinal Electronic Health Records (EHR) data augmented by a limited set of gold standard labels on the binary status information indicating whether the clinical event of interest occurred during the follow-up period. METHODS: The SeDDLeR algorithm calculates an individualized risk of developing future clinical events over time using each patient's baseline EHR features via the following steps: (1) construction of an initial EHR-derived surrogate as a proxy for the onset status; (2) deep learning calibration of the surrogate along gold-standard onset status; and (3) semi-supervised deep learning for risk prediction combining calibrated surrogates and gold-standard onset status. To account for missing onset time and heterogeneous follow-up, we introduce temporal kernel weighting. We devise a Gated Recurrent Units (GRUs) module to capture temporal characteristics. We subsequently assess our proposed SeDDLeR method in simulation studies and apply the method to the Massachusetts General Brigham (MGB) Biobank to predict type 2 diabetes (T2D) risk. RESULTS: SeDDLeR outperforms benchmark risk prediction methods, including Semi-parametric Transformation Model (STM) and DeepHit, with consistently best accuracy across experiments. SeDDLeR achieved the best C-statistics ( 0.815, SE 0.023; vs STM +.084, SE 0.030, P-value .004; vs DeepHit +.055, SE 0.027, P-value .024) and best average time-specific AUC (0.778, SE 0.022; vs STM + 0.059, SE 0.039, P-value .067; vs DeepHit + 0.168, SE 0.032, P-value <0.001) in the MGB T2D study. CONCLUSION: SeDDLeR can train robust risk prediction models in both real-world EHR and synthetic datasets with minimal requirements of labeling event times. It holds the potential to be incorporated for future clinical trial recruitment or clinical decision-making.

5.
Lifetime Data Anal ; 2024 Aug 24.
Artículo en Inglés | MEDLINE | ID: mdl-39180601

RESUMEN

This paper discusses regression analysis of current status data with dependent censoring, a problem that often occurs in many areas such as cross-sectional studies, epidemiological investigations and tumorigenicity experiments. Copula model-based methods are commonly employed to tackle this issue. However, these methods often face challenges in terms of model and parameter identification. The primary aim of this paper is to propose a copula-based analysis for dependent current status data, where the association parameter is left unspecified. Our method is based on a general class of semiparametric linear transformation models and parametric copulas. We demonstrate that the proposed semiparametric model is identifiable under certain regularity conditions from the distribution of the observed data. For inference, we develop a sieve maximum likelihood estimation method, using Bernstein polynomials to approximate the nonparametric functions involved. The asymptotic consistency and normality of the proposed estimators are established. Finally, to demonstrate the effectiveness and practical applicability of our method, we conduct an extensive simulation study and apply the proposed method to a real data example.

6.
Biometrics ; 79(1): 190-202, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-34747010

RESUMEN

Readily available proxies for the time of disease onset such as the time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow-up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on the current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error model for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initially estimator solely based on the labeled subset, we perform a one-step correction with the full data augmenting against a mean zero rank correlation score derived from the proxies. We establish the consistency and asymptotic normality of the proposed semisupervised estimator and provide a resampling procedure for interval estimation. Simulation studies demonstrate that the proposed estimator performs well in a finite sample. We illustrate the proposed estimator by developing a genetic risk prediction model for obesity using data from Mass General Brigham Healthcare Biobank.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Simulación por Computador , Factores de Riesgo
7.
Stat Med ; 42(8): 1207-1232, 2023 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-36690474

RESUMEN

We consider the design and analysis of two-phase studies aiming to assess the relation between a fixed (eg, genetic) marker and an event time under current status observation. We consider a common setting in which a phase I sample is comprised of a large cohort of individuals with outcome (ie, current status) data and a vector of inexpensive covariates. Stored biospecimens for individuals in the phase I sample can be assayed to record the marker of interest for individuals selected in a phase II sub-sample. The design challenge is then to select the phase II sub-sample in order to maximize the precision of the marker effect on the time of interest under a proportional hazards model. This problem has not been examined before for current status data and the role of the assessment time is highlighted. Inference based on likelihood and inverse probability weighted estimating functions are considered, with designs centered on score-based residuals, extreme current status observations, or stratified sampling schemes. Data from a registry of patients with psoriatic arthritis is used in an illustration where we study the risk of diabetes as a comorbidity.


Asunto(s)
Artritis Psoriásica , Proyectos de Investigación , Humanos , Simulación por Computador , Modelos de Riesgos Proporcionales , Probabilidad
8.
Stat Med ; 42(24): 4440-4457, 2023 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-37574218

RESUMEN

Current status data arise when each subject under study is examined only once at an observation time, and one only knows the failure status of the event of interest at the observation time rather than the exact failure time. Moreover, the obtained failure status is frequently subject to misclassification due to imperfect tests, yielding misclassified current status data. This article conducts regression analysis of such data with the semiparametric probit model, which serves as an important alternative to existing semiparametric models and has recently received considerable attention in failure time data analysis. We consider the nonparametric maximum likelihood estimation and develop an expectation-maximization (EM) algorithm by incorporating the generalized pool-adjacent-violators (PAV) algorithm to maximize the intractable likelihood function. The resulting estimators of regression parameters are shown to be consistent, asymptotically normal, and semiparametrically efficient. Furthermore, the numerical results in simulation studies indicate that the proposed method performs satisfactorily in finite samples and outperforms the naive method that ignores misclassification. We then apply the proposed method to a real dataset on chlamydia infection.

9.
Stat Med ; 42(26): 4886-4896, 2023 11 20.
Artículo en Inglés | MEDLINE | ID: mdl-37652042

RESUMEN

The approximate Bernstein polynomial model, a mixture of beta distributions, is applied to obtain maximum likelihood estimates of the regression coefficients, the baseline density and the survival functions in an accelerated failure time model based on interval censored data including current status data. The estimators of the regression coefficients and the underlying baseline density function are shown to be consistent with almost parametric rates of convergence under some conditions for uncensored and/or interval censored data. Simulation shows that the proposed method is better than its competitors. The proposed method is illustrated by fitting the Breast Cosmetic and the HIV infection time data using the accelerated failure time model.


Asunto(s)
Infecciones por VIH , Humanos , Funciones de Verosimilitud , Infecciones por VIH/tratamiento farmacológico , Modelos Estadísticos , Simulación por Computador , Factores de Tiempo
10.
Stat Med ; 41(18): 3561-3578, 2022 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-35608143

RESUMEN

We consider survival data that combine three types of observations: uncensored, right-censored, and left-censored. Such data arises from screening a medical condition, in situations where self-detection arises naturally. Our goal is to estimate the failure-time distribution, based on these three observation types. We propose a novel methodology for distribution estimation using both semiparametric and nonparametric techniques. We then evaluate the performance of these estimators via simulated data. Finally, as a case study, we estimate the patience of patients who arrive at an emergency department and wait for treatment. Three categories of patients are observed: those who leave the system and announce it, and thus their patience time is observed; those who get service and thus their patience time is right-censored by the waiting time; and those who leave the system without announcing it. For this third category, the patients' absence is revealed only when they are called to service, which is after they have already left; formally, their patience time is left-censored. Other applications of our proposed methodology are discussed.

11.
Lifetime Data Anal ; 28(4): 659-674, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-35748999

RESUMEN

Cross-sectionally sampled data with binary disease outcome are commonly analyzed in observational studies to identify the relationship between covariates and disease outcome. A cross-sectional population is defined as a population of living individuals at the sampling or observational time. It is generally understood that binary disease outcome from cross-sectional data contains less information than longitudinally collected time-to-event data, but there is insufficient understanding as to whether bias can possibly exist in cross-sectional data and how the bias is related to the population risk of interest. Wang and Yang (2021) presented the complexity and bias in cross-sectional data with binary disease outcome with detailed analytical explorations into the data structure. As the distribution of the cross-sectional binary outcome is quite different from the population risk distribution, bias can arise when using cross-sectional data analysis to draw inference for population risk. In this paper we argue that the commonly adopted age-specific risk probability is biased for the estimation of population risk and propose an outcome reassignment approach which reassigns a portion of the observed binary outcome, 0 or 1, to the other disease category. A sign test and a semiparametric pseudo-likelihood method are developed for analyzing cross-sectional data using the OR approach. Simulations and an analysis based on Alzheimer's Disease data are presented to illustrate the proposed methods.


Asunto(s)
Modelos Estadísticos , Sesgo , Causalidad , Simulación por Computador , Estudios Transversales , Humanos
12.
Biostatistics ; 21(4): 876-894, 2020 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-31086969

RESUMEN

In a cross-sectional study, adolescent and young adult females were asked to recall the time of menarche, if experienced. Some respondents recalled the date exactly, some recalled only the month or the year of the event, and some were unable to recall anything. We consider estimation of the menarcheal age distribution from this interval-censored data. A complicated interplay between age-at-event and calendar time, together with the evident fact of memory fading with time, makes the censoring informative. We propose a model where the probabilities of various types of recall would depend on the time since menarche. For parametric estimation, we model these probabilities using multinomial regression function. Establishing consistency and asymptotic normality of the parametric maximum likelihood estimator requires a bit of tweaking of the standard asymptotic theory, as the data format varies from case to case. We also provide a non-parametric maximum likelihood estimator, propose a computationally simpler approximation, and establish the consistency of both these estimators under mild conditions. We study the small sample performance of the parametric and non-parametric estimators through Monte Carlo simulations. Moreover, we provide a graphical check of the assumption of the multinomial model for the recall probabilities, which appears to hold for the menarcheal data set. Our analysis shows that the use of the partially recalled part of the data indeed leads to smaller confidence intervals of the survival function.


Asunto(s)
Estudios Transversales , Adolescente , Distribución por Edad , Femenino , Humanos , Método de Montecarlo , Probabilidad , Adulto Joven
13.
Biometrics ; 77(2): 599-609, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-32562264

RESUMEN

Panel current status data arise frequently in biomedical studies when the occurrence of a particular clinical condition is only examined at several prescheduled visit times. Existing methods for analyzing current status data have largely focused on regression modeling based on commonly used survival models such as the proportional hazards model and the accelerated failure time model. However, these procedures have the limitations of being difficult to implement and performing sub-optimally in relatively small sample sizes. The performance of these procedures is also unclear under model misspecification. In addition, no methods currently exist to evaluate the prediction performance of estimated risk models with panel current status data. In this paper, we propose a simple estimator under a general class of nonparametric transformation (NPT) models by fitting a logistic regression working model and demonstrate that our proposed estimator is consistent for the NPT model parameter up to a scale multiplier. Furthermore, we propose nonparametric estimators for evaluating the prediction performance of the risk score derived from model fitting, which is valid regardless of the adequacy of the fitted model. Extensive simulation results suggest that our proposed estimators perform well in finite samples and the regression parameter estimators outperform existing estimators under various scenarios. We illustrate the proposed procedures using data from the Framingham Offspring Study.


Asunto(s)
Modelos de Riesgos Proporcionales , Simulación por Computador , Modelos Logísticos , Tamaño de la Muestra
14.
Stat Med ; 40(10): 2400-2412, 2021 05 10.
Artículo en Inglés | MEDLINE | ID: mdl-33586218

RESUMEN

This research is motivated by a periodontal disease dataset that possesses certain special features. The dataset consists of clustered current status time-to-event observations with large and varying cluster sizes, where the cluster size is associated with the disease outcome. Also, heavy censoring is present in the data even with long follow-up time, suggesting the presence of a cured subpopulation. In this paper, we propose a computationally efficient marginal approach, namely the cluster-weighted generalized estimating equation approach, to analyze the data based on a class of semiparametric transformation cure models. The parametric and nonparametric components of the model are estimated using a Bernstein-polynomial based sieve maximum pseudo-likelihood approach. The asymptotic properties of the proposed estimators are studied. Simulation studies are conducted to evaluate the performance of the proposed estimators in scenarios with different degree of informative clustering and within-cluster dependence. The proposed method is applied to the motivating periodontal disease data for illustration.


Asunto(s)
Modelos Estadísticos , Análisis por Conglomerados , Simulación por Computador , Análisis Costo-Beneficio , Humanos , Funciones de Verosimilitud
15.
Stat Med ; 40(4): 950-962, 2021 02 20.
Artículo en Inglés | MEDLINE | ID: mdl-33169416

RESUMEN

A cross sectional population is defined as a population of living individuals at the sampling or observational time. Cross-sectionally sampled data with binary disease outcome are commonly analyzed in observational studies for identifying how covariates correlate with disease occurrence. It is generally understood that cross-sectional binary outcome is not as informative as longitudinally collected time-to-event data, but there is insufficient understanding as to whether bias can possibly exist in cross-sectional data and how the bias is related to the population risk of interest. As the progression of a disease typically involves both time and disease status, we consider how the binary disease outcome from the cross-sectional population is connected to birth-illness-death process in the target population. We argue that the distribution of cross-sectional binary outcome is different from the risk distribution from the target population and that bias would typically arise when using cross-sectional data to draw inference for population risk. In general, the cross-sectional risk probability is determined jointly by the population risk probability and the ratio of duration of diseased state to the duration of disease-free state. Through explicit formulas we conclude that bias can almost never be avoided from cross-sectional data. We present age-specific risk probability (ARP) and argue that models based on ARP offers a compromised but still biased approach to understand the population risk. An analysis based on Alzheimer's disease data is presented to illustrate the ARP model and possible critiques for the analysis results.


Asunto(s)
Estudios Transversales , Estudios Observacionales como Asunto , Sesgo , Causalidad , Humanos , Factores de Riesgo
16.
Lifetime Data Anal ; 27(3): 413-436, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-33895961

RESUMEN

Current status data occur in many fields including demographical, epidemiological, financial, medical, and sociological studies. We consider the regression analysis of current status data with latent variables. The proposed model consists of a factor analytic model for characterizing latent variables through their multiple surrogates and an additive hazard model for examining potential covariate effects on the hazards of interest in the presence of current status data. We develop a borrow-strength estimation procedure that incorporates the expectation-maximization algorithm and correlated estimating equations. The consistency and asymptotic normality of the proposed estimators are established. A simulation study is conducted to evaluate the finite sample performance of the proposed method. A real-life study on the chronic kidney disease of type 2 diabetic patients is presented.


Asunto(s)
Algoritmos , Modelos Estadísticos , Simulación por Computador , Humanos , Modelos de Riesgos Proporcionales , Análisis de Regresión
17.
Stat Med ; 38(20): 3703-3718, 2019 09 10.
Artículo en Inglés | MEDLINE | ID: mdl-31197854

RESUMEN

Variable selection is a crucial issue in model building and it has received considerable attention in the literature of survival analysis. However, available approaches in this direction have mainly focused on time-to-event data with right censoring. Moreover, a majority of existing variable selection procedures for survival models are developed in a frequentist framework. In this article, we consider additive hazards model in the presence of current status data. We propose a Bayesian adaptive least absolute shrinkage and selection operator procedure to conduct a simultaneous variable selection and parameter estimation. Efficient Markov chain Monte Carlo methods are developed to implement posterior sampling and inference. The empirical performance of the proposed method is demonstrated by simulation studies. An application to a study on the risk factors of heart failure disease for type 2 diabetes patients is presented.


Asunto(s)
Teorema de Bayes , Modelos de Riesgos Proporcionales , Simulación por Computador , Humanos , Cadenas de Markov , Método de Montecarlo , Análisis de Regresión
18.
Stat Med ; 38(19): 3628-3641, 2019 08 30.
Artículo en Inglés | MEDLINE | ID: mdl-31074119

RESUMEN

Rodent survival-sacrifice experiments are routinely conducted to assess the tumor-inducing potential of a certain exposure or drug. Because most tumors under study are impalpable, animals are examined at death for evidence of tumor formation. In some studies, the cause of death is ascertained by a pathologist to account for possible correlation between tumor development and death. Existing methods for survival-sacrifice data with cause-of-death information have been restricted to multi-group testing or one-sample estimation of tumor onset distribution and thus do not provide a natural way to quantify treatment effect or dose-response relationship. In this paper, we propose semiparametric regression methods under the popular proportional hazards model for both tumor onset and tumor-caused death. For inference, we develop a maximum pseudo-likelihood estimation procedure using a modified iterative convex minorant algorithm, which is guaranteed to converge to the unique maximizer of the objective function. Simulation studies under different tumor rates show that the new methods provide valid inference on the covariate-outcome relationship and outperform alternative approaches. A real study investigating the effects of benzidine dihydrochloride on liver tumor in mice is analyzed as an illustration.


Asunto(s)
Pruebas de Carcinogenicidad , Causas de Muerte , Modelos de Riesgos Proporcionales , Análisis de Regresión , Animales , Simulación por Computador , Funciones de Verosimilitud , Ratones , Ratas
19.
Biometrics ; 74(4): 1240-1249, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-29975791

RESUMEN

For analyzing current status data, a flexible partially linear proportional hazards model is proposed. Modeling flexibility is attained through using monotone splines to approximate the baseline cumulative hazard function, as well as B-splines to accommodate nonlinear covariate effects. To facilitate model fitting, a computationally efficient and easy to implement expectation-maximization algorithm is developed through a two-stage data augmentation process involving carefully structured latent Poisson random variables. Asymptotic normality and the efficiency of the spline estimator of the regression coefficients are established, and the spline estimators of the nonparametric components are shown to possess the optimal rate of convergence under suitable regularity conditions. The finite-sample performance of the proposed approach is evaluated through Monte Carlo simulation and it is further illustrated using uterine fibroid data arising from a prospective cohort study on early pregnancy.


Asunto(s)
Algoritmos , Estado de Salud , Leiomioma/epidemiología , Modelos de Riesgos Proporcionales , Adulto , Simulación por Computador , Femenino , Humanos , Método de Montecarlo , Distribución de Poisson , Embarazo
20.
Biometrics ; 74(1): 68-76, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-28437561

RESUMEN

Multivariate current-status data are frequently encountered in biomedical and public health studies. Semiparametric regression models have been extensively studied for univariate current-status data, but most existing estimation procedures are computationally intensive, involving either penalization or smoothing techniques. It becomes more challenging for the analysis of multivariate current-status data. In this article, we study the maximum likelihood estimations for univariate and bivariate current-status data under the semiparametric probit regression models. We present a simple computational procedure combining the expectation-maximization algorithm with the pool-adjacent-violators algorithm for solving the monotone constraint on the baseline function. Asymptotic properties of the maximum likelihood estimators are investigated, including the calculation of the explicit information bound for univariate current-status data, as well as the asymptotic consistency and convergence rate for bivariate current-status data. Extensive simulation studies showed that the proposed computational procedures performed well under small or moderate sample sizes. We demonstrate the estimation procedure with two real data examples in the areas of diabetic and HIV research.


Asunto(s)
Funciones de Verosimilitud , Modelos Estadísticos , Análisis de Regresión , Algoritmos , Simulación por Computador , Diabetes Mellitus , Infecciones por VIH , Humanos , Tamaño de la Muestra , Análisis de Supervivencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA