Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Stat Methods Med Res ; 33(3): 498-514, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38400526

ABSTRACT

In cancer studies, it is commonplace that a fraction of patients participating in the study are cured, such that not all of them will experience a recurrence, or death due to cancer. Also, it is plausible that some covariates, such as the treatment assigned to the patients or demographic characteristics, could affect both the patients' survival rates and cure/incidence rates. A common approach to accommodate these features in survival analysis is to consider a mixture cure survival model with the incidence rate modeled by a logistic regression model and latency part modeled by the Cox proportional hazards model. These modeling assumptions, though typical, restrict the structure of covariate effects on both the incidence and latency components. As a plausible recourse to attain flexibility, we study a class of semiparametric mixture cure models in this article, which incorporates two single-index functions for modeling the two regression components. A hybrid nonparametric maximum likelihood estimation method is proposed, where the cumulative baseline hazard function for uncured subjects is estimated nonparametrically, and the two single-index functions are estimated via Bernstein polynomials. Parameter estimation is carried out via a curated expectation-maximization algorithm. We also conducted a large-scale simulation study to assess the finite-sample performance of the estimator. The proposed methodology is illustrated via application to two cancer datasets.


Subject(s)
Models, Statistical , Neoplasms , Humans , Incidence , Proportional Hazards Models , Survival Analysis , Computer Simulation , Algorithms , Likelihood Functions
2.
HGG Adv ; 5(1): 100245, 2024 Jan 11.
Article in English | MEDLINE | ID: mdl-37817410

ABSTRACT

Mendelian randomization has been widely used to assess the causal effect of a heritable exposure variable on an outcome of interest, using genetic variants as instrumental variables. In practice, data on the exposure variable can be incomplete due to high cost of measurement and technical limits of detection. In this paper, we propose a valid and efficient method to handle both unmeasured and undetectable values of the exposure variable in one-sample Mendelian randomization analysis with individual-level data. We estimate the causal effect of the exposure variable on the outcome using maximum likelihood estimation and develop an expectation maximization algorithm for the computation of the estimator. Simulation studies show that the proposed method performs well in making inference on the causal effect. We apply our method to the Hispanic Community Health Study/Study of Latinos, a community-based prospective cohort study, and estimate the causal effect of several metabolites on phenotypes of interest.


Subject(s)
Mendelian Randomization Analysis , Public Health , Humans , Mendelian Randomization Analysis/methods , Prospective Studies , Causality , Hispanic or Latino/genetics
3.
Stat Methods Med Res ; 32(11): 2083-2095, 2023 11.
Article in English | MEDLINE | ID: mdl-37559549

ABSTRACT

Contemporary works in change-point survival models mainly focus on an unknown universal change-point shared by the whole study population. However, in some situations, the change-point is plausibly individual-specific, such as when it corresponds to the telomere length or menopausal age. Also, maximum-likelihood-based inference for the fixed change-point parameter is notoriously complicated. The asymptotic distribution of the maximum-likelihood estimator is non-standard, and computationally intensive bootstrap techniques are commonly used to retrieve its sampling distribution. This article is motivated by a breast cancer study, where the disease-free survival time of the patients is postulated to be regulated by the menopausal age, which is unobserved. As menopausal age varies across patients, a fixed change-point survival model may be inadequate. Therefore, we propose a novel proportional hazards model with a random change-point. We develop a nonparametric maximum-likelihood estimation approach and devise a stable expectation-maximization algorithm to compute the estimators. Because the model is regular, we employ conventional likelihood theory for inference based on the asymptotic normality of the Euclidean parameter estimators, and the variance of the asymptotic distribution can be consistently estimated by a profile-likelihood approach. A simulation study demonstrates the satisfactory finite-sample performance of the proposed methods, which yield small bias and proper coverage probabilities. The methods are applied to the motivating breast cancer study.


Subject(s)
Breast Neoplasms , Humans , Female , Likelihood Functions , Survival Analysis , Proportional Hazards Models , Computer Simulation
4.
Stat Sin ; 33(2): 633-662, 2023 Apr.
Article in English | MEDLINE | ID: mdl-37197479

ABSTRACT

Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.

5.
Lifetime Data Anal ; 29(1): 87-114, 2023 01.
Article in English | MEDLINE | ID: mdl-35831702

ABSTRACT

The incubation period is a key characteristic of an infectious disease. In the outbreak of a novel infectious disease, accurate evaluation of the incubation period distribution is critical for designing effective prevention and control measures . Estimation of the incubation period distribution based on limited information from retrospective inspection of infected cases is highly challenging due to censoring and truncation. In this paper, we consider a semiparametric regression model for the incubation period and propose a sieve maximum likelihood approach for estimation based on the symptom onset time, travel history, and basic demographics of reported cases. The approach properly accounts for the pandemic growth and selection bias in data collection. We also develop an efficient computation method and establish the asymptotic properties of the proposed estimators. We demonstrate the feasibility and advantages of the proposed methods through extensive simulation studies and provide an application to a dataset on the outbreak of COVID-19.


Subject(s)
COVID-19 , Infectious Disease Incubation Period , Humans , Likelihood Functions , Retrospective Studies , COVID-19/epidemiology , Regression Analysis , Computer Simulation
6.
Biometrics ; 79(3): 2010-2022, 2023 09.
Article in English | MEDLINE | ID: mdl-36377514

ABSTRACT

Clustered data frequently arise in biomedical studies, where observations, or subunits, measured within a cluster are associated. The cluster size is said to be informative, if the outcome variable is associated with the number of subunits in a cluster. In most existing work, the informative cluster size issue is handled by marginal approaches based on within-cluster resampling, or cluster-weighted generalized estimating equations. Although these approaches yield consistent estimation of the marginal models, they do not allow estimation of within-cluster associations and are generally inefficient. In this paper, we propose a semiparametric joint model for clustered interval-censored event time data with informative cluster size. We use a random effect to account for the association among event times of the same cluster as well as the association between event times and the cluster size. For estimation, we propose a sieve maximum likelihood approach and devise a computationally-efficient expectation-maximization algorithm for implementation. The estimators are shown to be strongly consistent, with the Euclidean components being asymptotically normal and achieving semiparametric efficiency. Extensive simulation studies are conducted to evaluate the finite-sample performance, efficiency and robustness of the proposed method. We also illustrate our method via application to a motivating periodontal disease dataset.


Subject(s)
Algorithms , Models, Statistical , Likelihood Functions , Regression Analysis , Computer Simulation
7.
Biom J ; 65(1): e2100139, 2023 01.
Article in English | MEDLINE | ID: mdl-35837982

ABSTRACT

Recent technological advances have made it possible to collect high-dimensional genomic data along with clinical data on a large number of subjects. In the studies of chronic diseases such as cancer, it is of great interest to integrate clinical and genomic data to build a comprehensive understanding of the disease mechanisms. Despite extensive studies on integrative analysis, it remains an ongoing challenge to model the interaction effects between clinical and genomic variables, due to high dimensionality of the data and heterogeneity across data types. In this paper, we propose an integrative approach that models interaction effects using a single-index varying-coefficient model, where the effects of genomic features can be modified by clinical variables. We propose a penalized approach for separate selection of main and interaction effects. Notably, the proposed methods can be applied to right-censored survival outcomes based on a Cox proportional hazards model. We demonstrate the advantages of the proposed methods through extensive simulation studies and provide applications to a motivating cancer genomic study.


Subject(s)
Genomics , Neoplasms , Humans , Proportional Hazards Models , Computer Simulation , Neoplasms/genetics
8.
Ann Stat ; 50(1): 487-510, 2022 Feb.
Article in English | MEDLINE | ID: mdl-35813218

ABSTRACT

In long-term follow-up studies, data are often collected on repeated measures of multivariate response variables as well as on time to the occurrence of a certain event. To jointly analyze such longitudinal data and survival time, we propose a general class of semiparametric latent-class models that accommodates a heterogeneous study population with flexible dependence structures between the longitudinal and survival outcomes. We combine nonparametric maximum likelihood estimation with sieve estimation and devise an efficient EM algorithm to implement the proposed approach. We establish the asymptotic properties of the proposed estimators through novel use of modern empirical process theory, sieve estimation theory, and semiparametric efficiency theory. Finally, we demonstrate the advantages of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities study.

9.
Biometrics ; 78(1): 165-178, 2022 03.
Article in English | MEDLINE | ID: mdl-33140426

ABSTRACT

A flexible class of semiparametric partly linear frailty transformation models is considered for analyzing clustered interval-censored data, which arise naturally in complex diseases and dental research. This class of models features two nonparametric components, resulting in a nonparametric baseline survival function and a potential nonlinear effect of a continuous covariate. The dependence among failure times within a cluster is induced by a shared, unobserved frailty term. A sieve maximum likelihood estimation method based on piecewise linear functions is proposed. The proposed estimators of the regression, dependence, and transformation parameters are shown to be strongly consistent and asymptotically normal, whereas the estimators of the two nonparametric functions are strongly consistent with optimal rates of convergence. An extensive simulation study is conducted to study the finite-sample performance of the proposed estimators. We provide an application to a dental study for illustration.


Subject(s)
Frailty , Computer Simulation , Humans , Likelihood Functions , Linear Models , Models, Statistical
10.
Stat Med ; 40(10): 2400-2412, 2021 05 10.
Article in English | MEDLINE | ID: mdl-33586218

ABSTRACT

This research is motivated by a periodontal disease dataset that possesses certain special features. The dataset consists of clustered current status time-to-event observations with large and varying cluster sizes, where the cluster size is associated with the disease outcome. Also, heavy censoring is present in the data even with long follow-up time, suggesting the presence of a cured subpopulation. In this paper, we propose a computationally efficient marginal approach, namely the cluster-weighted generalized estimating equation approach, to analyze the data based on a class of semiparametric transformation cure models. The parametric and nonparametric components of the model are estimated using a Bernstein-polynomial based sieve maximum pseudo-likelihood approach. The asymptotic properties of the proposed estimators are studied. Simulation studies are conducted to evaluate the performance of the proposed estimators in scenarios with different degree of informative clustering and within-cluster dependence. The proposed method is applied to the motivating periodontal disease data for illustration.


Subject(s)
Models, Statistical , Cluster Analysis , Computer Simulation , Cost-Benefit Analysis , Humans , Likelihood Functions
11.
Genome Biol ; 20(1): 52, 2019 03 07.
Article in English | MEDLINE | ID: mdl-30845957

ABSTRACT

We propose a statistical boosting method, termed I-Boost, to integrate multiple types of high-dimensional genomics data with clinical data for predicting survival time. I-Boost provides substantially higher prediction accuracy than existing methods. By applying I-Boost to The Cancer Genome Atlas, we show that the integration of multiple genomics platforms with clinical variables improves the prediction of survival time over the use of clinical variables alone; gene expression values are typically more prognostic of survival time than other genomics data types; and gene modules/signatures are at least as prognostic as the collection of individual gene expression data.


Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Genomics/methods , Neoplasms/mortality , Software , Humans , Models, Statistical , Neoplasms/genetics , Prognosis , Survival Rate
12.
J Am Stat Assoc ; 114(528): 1778-1786, 2019.
Article in English | MEDLINE | ID: mdl-31920211

ABSTRACT

Analysis of genomic data is often complicated by the presence of missing values, which may arise due to cost or other reasons. The prevailing approach of single imputation is generally invalid if the imputation model is misspecified. In this paper, we propose a robust score statistic based on imputed data for testing the association between a phenotype and a genomic variable with (partially) missing values. We fit a semiparametric regression model for the genomic variable against an arbitrary function of the linear predictor in the phenotype model and impute each missing value by its estimated posterior expectation. We show that the score statistic with such imputed values is asymptotically unbiased under general missing-data mechanisms, even when the imputation model is misspecified. We develop a spline-based method to estimate the semiparametric imputation model and derive the asymptotic distribution of the corresponding score statistic with a consistent variance estimator using sieve approximation theory and empirical process theory. The proposed test is computationally feasible regardless of the number of independent variables in the imputation model. We demonstrate the advantages of the proposed method over existing methods through extensive simulation studies and provide an application to a major cancer genomics study.

13.
J Am Stat Assoc ; 113(522): 893-905, 2018.
Article in English | MEDLINE | ID: mdl-30083023

ABSTRACT

Structural equation modeling is commonly used to capture complex structures of relationships among multiple variables, both latent and observed. We propose a general class of structural equation models with a semiparametric component for potentially censored survival times. We consider nonparametric maximum likelihood estimation and devise a combined Expectation-Maximization and Newton-Raphson algorithm for its implementation. We establish conditions for model identifiability and prove the consistency, asymptotic normality, and semiparametric efficiency of the estimators. Finally, we demonstrate the satisfactory performance of the proposed methods through simulation studies and provide an application to a motivating cancer study that contains a variety of genomic variables. Supplementary materials for this article are available online.

14.
Biometrics ; 72(4): 1173-1183, 2016 12.
Article in English | MEDLINE | ID: mdl-27060984

ABSTRACT

In many classical estimation problems, the parameter space has a boundary. In most cases, the standard asymptotic properties of the estimator do not hold when some of the underlying true parameters lie on the boundary. However, without knowledge of the true parameter values, confidence intervals constructed assuming that the parameters lie in the interior are generally over-conservative. A penalized estimation method is proposed in this article to address this issue. An adaptive lasso procedure is employed to shrink the parameters to the boundary, yielding oracle inference which adapt to whether or not the true parameters are on the boundary. When the true parameters are on the boundary, the inference is equivalent to that which would be achieved with a priori knowledge of the boundary, while if the converse is true, the inference is equivalent to that which is obtained in the interior of the parameter space. The method is demonstrated under two practical scenarios, namely the frailty survival model and linear regression with order-restricted parameters. Simulation studies and real data analyses show that the method performs well with realistic sample sizes and exhibits certain advantages over standard methods.


Subject(s)
Linear Models , Models, Statistical , Survival Analysis , Computer Simulation , Databases, Factual , Humans , Likelihood Functions , Lung Neoplasms/mortality , Regression Analysis , Sample Size , United States , United States Department of Veterans Affairs
15.
PLoS One ; 9(8): e101972, 2014.
Article in English | MEDLINE | ID: mdl-25093728

ABSTRACT

OBJECTIVE: To investigate the effectiveness of educational poster on improving secondary school students' knowledge of emergency management of dental trauma. METHODS: A cluster randomised controlled trial was conducted. 16 schools with total 671 secondary students who can read Chinese or English were randomised into intervention (poster, 8 schools, 364 students) and control groups (8 schools, 305 students) at the school level. Baseline knowledge of dental trauma was obtained by a questionnaire. Poster containing information of dental trauma management was displayed in a classroom for 2 weeks in each school in the intervention group whereas in the control group there was no display of such posters. Students of both groups completed the same questionnaire after 2 weeks. RESULTS: Two-week display of posters improved the knowledge score by 1.25 (p-value = 0.0407) on average. CONCLUSION: Educational poster on dental trauma management significantly improved the level of knowledge of secondary school students in Hong Kong. TRIAL REGISTRATION: HKClinicalTrial.com HKCTR-1343 ClinicalTrials.gov NCT01809457.


Subject(s)
Dental Care/methods , Emergencies , Health Education/methods , Knowledge , Posters as Topic , Tooth Injuries/therapy , Adolescent , Adult , Child , Female , Humans , Information Dissemination/methods , Male , Program Evaluation , Schools , Students , Young Adult
16.
PLoS One ; 9(1): e84406, 2014.
Article in English | MEDLINE | ID: mdl-24400088

ABSTRACT

OBJECTIVES: To investigate Hong Kong secondary school students' knowledge of emergency management of dental trauma. METHOD: A questionnaire survey on randomly selected secondary school students using cluster sampling. RESULTS: Only 36.6% (209/571) of the respondents were able to correctly identify the appropriate place for treatment of dental injury. 55.2% of the respondents knew the suitable time for treatment. Only 24.7% of the respondents possessed the knowledge of how to correctly manage fractured teeth. Only 23.6% of them knew how to manage displaced teeth. 62.5% of them correctly answered that knocked-out deciduous teeth should not be replanted to the original position, but few of them (23.6%) knew that permanent teeth should be replanted. Moreover, 37.1% of the respondents correctly identified at least one of the appropriate media for storing a knocked-out tooth. First-aid training and acquisition of dental injury information from other sources were significant factors that positive responses from these questions would lead to higher scores. CONCLUSION: Hong Kong secondary school students' knowledge of emergency management of dental trauma is considered insufficient. An educational campaign in secondary schools dedicated to students is recommended. Prior first-aid training and acquisition of dental injury information from other sources positively relate to the level of knowledge. Dental trauma emergency management is recommended to be added to first-aid publications and be taught to students and health professionals. TRIAL REGISTRATION: Hong Kong Clinical Trial Centre HKCTR-1344.


Subject(s)
First Aid , Health Knowledge, Attitudes, Practice , Students , Tooth Injuries/epidemiology , Wounds and Injuries/epidemiology , Adolescent , Child , Dental Care , Dental Health Surveys , Female , Hong Kong/epidemiology , Humans , Male , Schools , Surveys and Questionnaires , Young Adult
17.
PLoS One ; 8(9): e74833, 2013.
Article in English | MEDLINE | ID: mdl-24147154

ABSTRACT

OBJECTIVE: To investigate the effectiveness of educational posters in improving the knowledge level of primary and secondary school teachers regarding emergency management of dental trauma. METHODS: A cluster randomised controlled trial was conducted. 32 schools with a total of 515 teachers were randomised into intervention (poster) and control groups at the school level. Teachers' baseline levels of knowledge about dental trauma were obtained by using a questionnaire. Posters containing information on dental trauma management were displayed in the school medical room, the common room used by staff, and on a notice board for 2 weeks in each school of the intervention group; in the control group, no posters were displayed. Teachers in both groups completed the questionnaire after 2 weeks. RESULTS: The teachers in the intervention schools (where posters were displayed for 2 weeks) showed statistically significant improvement in scores in cases where they had not previously learned about dental emergencies from sources other than first aid training, with an average score increase of 2.6656 (score range of questionnaire, -13 to 9; p-value <0.0001). CONCLUSION: Educational posters on the management of dental trauma can significantly improve the level of knowledge of primary and secondary school teachers in Hong Kong. KClinicalTrials.com HKCTR-1307 ClinicalTrials.gov: NCT01707355.


Subject(s)
Faculty , First Aid , Health Knowledge, Attitudes, Practice , Schools , Tooth Injuries , Adult , Aged , Case-Control Studies , Female , Humans , Male , Middle Aged , Surveys and Questionnaires , Young Adult
18.
Biom J ; 55(5): 771-88, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23720128

ABSTRACT

There is a growing interest in the analysis of survival data with a cured proportion particularly in tumor recurrences studies. Biologically, it is reasonable to assume that the recurrence time is mainly affected by the overall health condition of the patient that depends on some covariates such as age, sex, or treatment type received. We propose a semiparametric frailty-Cox cure model to quantify the overall health condition of the patient by a covariate-dependent frailty that has a discrete mass at zero to characterize the cured patients, and a positive continuous part to characterize the heterogeneous health conditions among the uncured patients. A multiple imputation estimation method is proposed for the right-censored case, which is further extended to accommodate interval-censored data. Simulation studies show that the performance of the proposed method is highly satisfactory. For illustration, the model is fitted to a set of right-censored melanoma incidence data and a set of interval-censored breast cosmesis data. Our analysis suggests that patients receiving treatment of radiotherapy with adjuvant chemotherapy have a significantly higher probability of breast retraction, but also a lower hazard rate of breast retraction among those patients who will eventually experience the event with similar health conditions. The interpretation is very different to those based on models without a cure component that the treatment of radiotherapy with adjuvant chemotherapy significantly increases the risk of breast retraction.


Subject(s)
Biometry/methods , Models, Statistical , Neoplasms/therapy , Humans , Kaplan-Meier Estimate , Neoplasms/drug therapy , Neoplasms/radiotherapy , Survival Analysis , Treatment Outcome
19.
Stat Med ; 32(8): 1283-93, 2013 Apr 15.
Article in English | MEDLINE | ID: mdl-22987667

ABSTRACT

In various medical related researches, excessive zeros, which make the standard Poisson regression model inadequate, often exist in count data. We proposed a covariate-dependent random effect model to accommodate the excess zeros and the heterogeneity in the population simultaneously. This work is motivated by a data set from a survey on the dental health status of Hong Kong preschool children where the response variable is the number of decayed, missing, or filled teeth. The random effect has a sound biological interpretation as the overall oral health status or other personal qualities of an individual child that is unobserved and unable to be quantified easily. The overall measure of oral health status, responsible for accommodating the excessive zeros and also the heterogeneity among the children, is covariate dependent. This covariate-dependent random effect model allows one to distinguish whether a potential covariate has an effect on the conceived overall oral health condition of the children, that is, the random effect, or has a direct effect on the magnitude of the counts, or both. We proposed a multiple imputation approach for estimation of the parameters. We discussed the choice of the imputation size. We evaluated the performance of the proposed estimation method through simulation studies, and we applied the model and method to the dental data.


Subject(s)
Algorithms , Data Interpretation, Statistical , Models, Statistical , Child, Preschool , Computer Simulation , Hong Kong , Humans , Oral Health
SELECTION OF CITATIONS
SEARCH DETAIL