RESUMO
Deep learning has continuously attained huge success in diverse fields, while its application to survival data analysis remains limited and deserves further exploration. For the analysis of current status data, a deep partially linear Cox model is proposed to circumvent the curse of dimensionality. Modeling flexibility is attained by using deep neural networks (DNNs) to accommodate nonlinear covariate effects and monotone splines to approximate the baseline cumulative hazard function. We establish the convergence rate of the proposed maximum likelihood estimators. Moreover, we derive that the finite-dimensional estimator for treatment covariate effects is $\sqrt{n}$-consistent, asymptotically normal, and attains semiparametric efficiency. Finally, we demonstrate the performance of our procedures through extensive simulation studies and application to real-world data on news popularity.
Assuntos
Modelos de Riscos Proporcionais , Funções Verossimilhança , Análise de Sobrevida , Simulação por Computador , Modelos LinearesRESUMO
The application of transfer learning in fault diagnosis has been developed in recent years. It can use existing data to solve the problem of fault recognition under different working conditions. Due to the complexity of the equipment and the openness of the working environment in industrial production, the status of the equipment is changeable, and the collected signals can have new fault classes. Therefore, the open set recognition ability of the transfer learning method is an urgent research direction. The existing transfer learning model can have a severe negative transfer problem when solving the open set problem, resulting in the aliasing of samples in the feature space and the inability to separate the unknown classes. To solve this problem, we propose a Weighted Domain Adaptation with Double Classifiers (WDADC) method. Specifically, WDADC designs the weighting module based on Jensen-Shannon divergence, which can evaluate the similarity between each sample in the target domain and each class in the source domain. Based on this similarity, a weighted loss is constructed to promote the positive transfer between shared classes in the two domains to realize the recognition of shared classes and the separation of unknown classes. In addition, the structure of double classifiers in WDADC can mitigate the overfitting of the model by maximizing the discrepancy, which helps extract the domain-invariant and class-separable features of the samples when the discrepancy between the two domains is large. The model's performance is verified in several fault datasets of rotating machinery. The results show that the method is effective in open set fault diagnosis and superior to the common domain adaptation methods.
RESUMO
When a single gene influences more than one trait, known as pleiotropy, it is important to detect pleiotropy to improve the biological understanding of a gene. This can lead to improved screening, diagnosis, and treatment of diseases. Yet, most current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or fewer traits are associated with a genetic variant. We recently developed statistical methods to analyze pleiotropy for quantitative traits having a multivariate normal distribution. We now extend this approach to traits that can be modeled by generalized linear models, such as analysis of binary, ordinal, or quantitative traits, or a mixture of these types of traits. Based on methods from estimating equations, we developed a new test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that $k+1$ traits are associated, given that the null of $k$ associated traits was rejected. This provides a testing framework to determine the number of traits associated with a genetic variant, as well as which traits, while accounting for correlations among the traits. By simulations, we illustrate the Type-I error rate and power of our new methods, describe how they are influenced by sample size, the number of traits, and the trait correlations, and apply the new methods to a genome-wide association study of multivariate traits measuring symptoms of major depression. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice.
Assuntos
Bioestatística/métodos , Pleiotropia Genética , Estudo de Associação Genômica Ampla/métodos , Modelos Lineares , Análise Multivariada , Simulação por Computador , Transtorno Depressivo/genética , HumanosRESUMO
This article discusses regression analysis of right-censored failure time data where there may exist a cured subgroup, and also covariate effects may be varying with time, a phenomena that often occurs in many medical studies. To address the problem, we discuss a class of varying coefficient transformation models along with a logistic model for the cured subgroup. For inference, a sieve maximum likelihood approach is developed with the use of spline functions, and the asymptotic properties of the proposed estimators are established. The proposed method can be easily implemented, and the conducted simulation study suggests that the proposed method works well in practical situations. An illustrative example is provided.
Assuntos
Algoritmos , Funções Verossimilhança , Modelos Logísticos , Doenças Assintomáticas/terapia , Viés , Simulação por Computador , Humanos , Transplante de Rim , Modelos de Riscos Proporcionais , Análise de RegressãoRESUMO
High-throughput sequencing technologies have enabled large-scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been routinely performed in many studies. However, current microbiome association tests all focus on a single outcome. It has become increasingly common for a microbiome study to collect multiple, possibly related, outcomes to maximize the power of discovery. As these outcomes may share common mechanisms, jointly analyzing these outcomes can amplify the association signal and improve statistical power to detect potential associations. We propose the multivariate microbiome regression-based kernel association test (MMiRKAT) for testing association between multiple continuous outcomes and overall microbiome composition, where the kernel used in MMiRKAT is based on Bray-Curtis or UniFrac distance. MMiRKAT directly regresses all outcomes on the microbiome profiles via a semiparametric kernel machine regression framework, which allows for covariate adjustment and evaluates the association via a variance-component score test. Because most of the current microbiome studies have small sample sizes, a novel small-sample correction procedure is implemented in MMiRKAT to correct for the conservativeness of the association test when the sample size is small or moderate. The proposed method is assessed via simulation studies and an application to a real data set examining the association between host gene expression and mucosal microbiome composition. We demonstrate that MMiRKAT is more powerful than large sample based multivariate kernel association test, while controlling the type I error. A free implementation of MMiRKAT in R language is available at http://research.fhcrc.org/wu/en.html.
Assuntos
Polipose Adenomatosa do Colo/genética , Estudos de Associação Genética , Marcadores Genéticos/genética , Microbiota/genética , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Polipose Adenomatosa do Colo/microbiologia , Estudos de Casos e Controles , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mucosa/microbiologia , Filogenia , Tamanho da AmostraRESUMO
Time-to-event data are very common in observational studies. Unlike randomized experiments, observational studies suffer from both observed and unobserved confounding biases. To adjust for observed confounding in survival analysis, the commonly used methods are the Cox proportional hazards (PH) model, the weighted logrank test, and the inverse probability of treatment weighted Cox PH model. These methods do not rely on fully parametric models, but their practical performances are highly influenced by the validity of the PH assumption. Also, there are few methods addressing the hidden bias in causal survival analysis. We propose a strategy to test for survival function differences based on the matching design and explore sensitivity of the P-values to assumptions about unmeasured confounding. Specifically, we apply the paired Prentice-Wilcoxon (PPW) test or the modified PPW test to the propensity score matched data. Simulation studies show that the PPW-type test has higher power in situations when the PH assumption fails. For potential hidden bias, we develop a sensitivity analysis based on the matched pairs to assess the robustness of our finding, following Rosenbaum's idea for nonsurvival data. For a real data illustration, we apply our method to an observational cohort of chronic liver disease patients from a Mayo Clinic study. The PPW test based on observed data initially shows evidence of a significant treatment effect. But this finding is not robust, as the sensitivity analysis reveals that the P-value becomes nonsignificant if there exists an unmeasured confounder with a small impact.
Assuntos
Estudos Observacionais como Assunto/estatística & dados numéricos , Análise de Sobrevida , Viés , Bioestatística , Causalidade , Estudos de Coortes , Simulação por Computador , Fatores de Confusão Epidemiológicos , Humanos , Estimativa de Kaplan-Meier , Cirrose Hepática Biliar/tratamento farmacológico , Cirrose Hepática Biliar/mortalidade , Modelos Logísticos , Penicilamina/uso terapêutico , Pontuação de Propensão , Modelos de Riscos ProporcionaisRESUMO
This paper discusses regression analysis of doubly censored failure time data when there may exist a cured subgroup. By doubly censored data, we mean that the failure time of interest denotes the elapsed time between two related events and the observations on both event times can suffer censoring (Sun in The statistical analysis of interval-censored failure time data. Springer, New York, 2006). One typical example of such data is given by an acquired immune deficiency syndrome cohort study. Although many methods have been developed for their analysis (De Gruttola and Lagakos in Biometrics 45:1-12, 1989; Sun et al. in Biometrics 55:909-914, 1999; 60:637-643, 2004; Pan in Biometrics 57:1245-1250, 2001), it does not seem to exist an established method for the situation with a cured subgroup. This paper discusses this later problem and presents a sieve approximation maximum likelihood approach. In addition, the asymptotic properties of the resulting estimators are established and an extensive simulation study indicates that the method seems to work well for practical situations. An application is also provided.
Assuntos
Viés , Probabilidade , Análise de Regressão , Algoritmos , Infecções por HIV/diagnóstico , Infecções por HIV/tratamento farmacológico , Funções Verossimilhança , Modelos Teóricos , Fatores de Tempo , Resultado do TratamentoRESUMO
In many clinical studies, patients may be asked to report their medication adherence, presence of side effects, substance use, and hospitalization information during the study period. However, the exact occurrence time of these recurrent events may not be available due to privacy protection, recall difficulty, or incomplete medical records. Instead, the only available information is whether the events of interest have occurred during the past period. In this paper, we call these incomplete recurrent events as repeated current status data. Currently, there are no valid standard methods for this kind of data. We propose to use the Andersen-Gill proportional intensity assumption to analyze such data. Specifically, we propose a maximum sieve likelihood approach for inference and we show that the proposed estimators for regression coefficients are consistent, asymptotically normal and attain semiparametric efficiency bounds. Simulation studies show that the proposed approach performs well with small sample sizes. Finally, our method is applied to study medication adherence in a clinical trial on non-psychotic major depressive disorder.
RESUMO
This paper studies semiparametric regression analysis of panel count data, which arise naturally when recurrent events are considered. Such data frequently occur in medical follow-up studies and reliability experiments, for example. To explore the nonlinear interactions between covariates, we propose a class of partially linear models with possibly varying coefficients for the mean function of the counting processes with panel count data. The functional coefficients are estimated by B-spline function approximations. The estimation procedures are based on maximum pseudo-likelihood and likelihood approaches and they are easy to implement. The asymptotic properties of the resulting estimators are established, and their finite-sample performance is assessed by Monte Carlo simulation studies. We also demonstrate the value of the proposed method by the analysis of a cancer data set, where the new modeling approach provides more comprehensive information than the usual proportional mean model.
Assuntos
Funções Verossimilhança , Modelos Lineares , Modelos Estatísticos , Humanos , Análise de Regressão , Reprodutibilidade dos TestesRESUMO
We consider a general semiparametric hazards regression model that encompasses the Cox proportional hazards model and the accelerated failure time model for survival analysis. To overcome the nonexistence of the maximum likelihood, we derive a kernel-smoothed profile likelihood function and prove that the resulting estimates of the regression parameters are consistent and achieve semiparametric efficiency. In addition, we develop penalized structure selection techniques to determine which covariates constitute the accelerated failure time model and which covariates constitute the proportional hazards model. The proposed method is able to estimate the model structure consistently and model parameters efficiently. Furthermore, variance estimation is straightforward. The proposed estimation performs well in simulation studies and is applied to the analysis of a real data set.
Assuntos
Funções Verossimilhança , Modelos de Riscos Proporcionais , Análise de Sobrevida , Antraciclinas/uso terapêutico , Criança , Simulação por Computador , Feminino , Doença de Hodgkin/tratamento farmacológico , Doença de Hodgkin/radioterapia , Humanos , Masculino , Doses de RadiaçãoRESUMO
Event history studies occur in many fields including economics, medical studies, and social science. In such studies concerning some recurrent events, two types of data have been extensively discussed in the literature. One is recurrent event data that arise if study subjects are monitored or observed continuously. In this case, the observed information provides the times of all occurrences of the recurrent events of interest. The other is panel count data, which occur if the subjects are monitored or observed only periodically. This can happen if the continuous observation is too expensive or not practical, and in this case, only the numbers of occurrences of the events between subsequent observation times are available. In this paper, we discuss a third type of data, which is a mixture of recurrent event and panel count data and for which there exists little literature. For regression analysis of such data, we present a marginal mean model and propose an estimating equation-based approach for estimation of regression parameters. We conduct a simulation study to assess the finite sample performance of the proposed methodology, and the results indicate that it works well for practical situations. Finally, we apply it to a motivating study on childhood cancer survivors.
Assuntos
Modelos Estatísticos , Recidiva Local de Neoplasia/epidemiologia , Neoplasias/epidemiologia , Análise de Regressão , Adolescente , Simulação por Computador , Feminino , Humanos , Recidiva Local de Neoplasia/patologia , Neoplasias/patologia , Distribuição de Poisson , Gravidez , Adulto JovemRESUMO
Longitudinal data analysis is one of the most discussed and applied areas in statistics and a great deal of literature has been developed for it. However, most of the existing literature focus on the situation where observation times are fixed or can be treated as fixed constants. This paper considers the situation where these observation times may be random variables and more importantly, they may be related to the underlying longitudinal variable or process of interest. Furthermore, covariate effects may be time-varying. For the analysis, a joint modeling approach is proposed and in particular, for estimation of time-varying regression parameters, an estimating equation-based procedure is developed. Both asymptotic and finite sample properties of the proposed estimates are established. The methodology is applied to an acute myeloid leukemia trial that motivated this study.
Assuntos
Custos de Cuidados de Saúde , Estudos Longitudinais/métodos , Modelos Estatísticos , Simulação por Computador , Feminino , Humanos , Infecções/complicações , Infecções/economia , Leucemia Mieloide Aguda/complicações , Leucemia Mieloide Aguda/economia , MasculinoRESUMO
Recurrent event data occur in many clinical and observational studies, and in these situations, there may exist a terminal event such as death that is related to the recurrent event of interest. In addition, sometimes more than one type of recurrent events may occur, that is, one may encounter multivariate recurrent event data with some dependent terminal event. For the analysis of such data, one must take into account the dependence among different types of recurrent events and that between the recurrent events and the terminal event. In this paper, we extend a method for univariate recurrent and terminal events and propose a joint modeling approach for regression analysis of the data and establish the finite and asymptotic properties of the resulting estimates of unknown parameters. The method is applied to a set of bivariate recurrent event data arising from a long-term follow-up study of childhood cancer survivors.
Assuntos
Interpretação Estatística de Dados , Análise Multivariada , Adolescente , Adulto , Criança , Simulação por Computador/estatística & dados numéricos , Feminino , Humanos , Funções Verossimilhança , Masculino , Estudos Multicêntricos como Assunto/estatística & dados numéricos , Neoplasias/mortalidade , Recidiva , Análise de Sobrevida , Adulto JovemRESUMO
In this article, we propose a class of Box-Cox transformation models for recurrent event data, which includes the proportional means models as special cases. The new model offers great flexibility in formulating the effects of covariates on the mean functions of counting processes while leaving the stochastic structure completely unspecified. For the inference on the proposed models, we apply a profile pseudo-partial likelihood method to estimate the model parameters via estimating equation approaches and establish large sample properties of the estimators and examine its performance in moderate-sized samples through simulation studies. In addition, some graphical and numerical procedures are presented for model checking. An example of application on a set of multiple-infection data taken from a clinic study on chronic granulomatous disease (CGD) is also illustrated.
Assuntos
Interpretação Estatística de Dados , Modelos Biológicos , Modelos Estatísticos , Recidiva , Simulação por Computador , Doença Granulomatosa Crônica/imunologia , Humanos , Infecções/microbiologia , Interferon gama/administração & dosagem , Interferon gama/farmacologiaRESUMO
Interval-censored failure time data often arise in clinical trials and medical follow-up studies, and a few methods have been proposed for their regression analysis using various regression models (Finkelstein (1986); Huang (1996); Lin, Oakes, and Ying (1998); Sun (2006)). This paper proposes an estimating equation-based approach for regression analysis of interval-censored failure time data with the additive hazards model. The proposed approach is robust and applies to both noninformative and informative censoring cases. A major advantage of the proposed method is that it does not involve estimation of any baseline hazard function. The implementation of the propsoed approach is easy and fast. Asymptotic properties of the proposed estimates are established and some simulation results and an application are provided.
RESUMO
Recurrent event data occur in many clinical and observational studies (Cook and Lawless, Analysis of recurrent event data, 2007) and in these situations, there may exist a terminal event such as death that is related to the recurrent event of interest (Ghosh and Lin, Biometrics 56:554-562, 2000; Wang et al., J Am Stat Assoc 96:1057-1065, 2001; Huang and Wang, J Am Stat Assoc 99:1153-1165, 2004; Ye et al., Biometrics 63:78-87, 2007). In addition, sometimes there may exist more than one type of recurrent events, that is, one faces multivariate recurrent event data with some dependent terminal event (Chen and Cook, Biostatistics 5:129-143, 2004). It is apparent that for the analysis of such data, one has to take into account the dependence both among different types of recurrent events and between the recurrent and terminal events. In this paper, we propose a joint modeling approach for regression analysis of the data and both finite and asymptotic properties of the resulting estimates of unknown parameters are established. The methodology is applied to a set of bivariate recurrent event data arising from a study of leukemia patients.
Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Análise Multivariada , Análise de Regressão , Infecções Bacterianas/complicações , Infecções Bacterianas/epidemiologia , Feminino , Humanos , Leucemia Mieloide Aguda/complicações , Leucemia Mieloide Aguda/tratamento farmacológico , Masculino , Micoses/complicações , Micoses/epidemiologia , Recidiva , Viroses/complicações , Viroses/epidemiologiaRESUMO
This article discusses regression analysis of mixed interval-censored failure time data. Such data frequently occur across a variety of settings, including clinical trials, epidemiologic investigations, and many other biomedical studies with a follow-up component. For example, mixed failure times are commonly found in the two largest studies of long-term survivorship after childhood cancer, the datasets that motivated this work. However, most existing methods for failure time data consider only right-censored or only interval-censored failure times, not the more general case where times may be mixed. Additionally, among regression models developed for mixed interval-censored failure times, the proportional hazards formulation is generally assumed. It is well-known that the proportional hazards model may be inappropriate in certain situations, and alternatives are needed to analyze mixed failure time data in such cases. To fill this need, we develop a maximum likelihood estimation procedure for the proportional odds regression model with mixed interval-censored data. We show that the resulting estimators are consistent and asymptotically Gaussian. An extensive simulation study is performed to assess the finite-sample properties of the method, and this investigation indicates that the proposed method works well for many practical situations. We then apply our approach to examine the impact of age at cranial radiation therapy on risk of growth hormone deficiency in long-term survivors of childhood cancer.
RESUMO
This paper discusses regression analysis of multivariate current status failure time data (The Statistical Analysis of Interval-censoring Failure Time Data. Springer: New York, 2006), which occur quite often in, for example, tumorigenicity experiments and epidemiologic investigations of the natural history of a disease. For the problem, several marginal approaches have been proposed that model each failure time of interest individually (Biometrics 2000; 56:940-943; Statist. Med. 2002; 21:3715-3726). In this paper, we present a full likelihood approach based on the proportional hazards frailty model. For estimation, an Expectation Maximization (EM) algorithm is developed and simulation studies suggest that the presented approach performs well for practical situations. The approach is applied to a set of bivariate current status data arising from a tumorigenicity experiment.
Assuntos
Algoritmos , Simulação por Computador , Análise Multivariada , Modelos de Riscos Proporcionais , Análise de Regressão , Neoplasias das Glândulas Suprarrenais/induzido quimicamente , Animais , Cloropreno/toxicidade , Feminino , Humanos , Neoplasias Pulmonares/induzido quimicamente , Masculino , Ratos , Ratos Endogâmicos F344 , Análise de SobrevidaRESUMO
This paper discusses regression analysis of panel count data that often arise in longitudinal studies concerning occurrence rates of certain recurrent events. Panel count data mean that each study subject is observed only at discrete time points rather than under continuous observation. Furthermore, both observation and follow-up times can vary from subject to subject and may be correlated with the recurrent events. For inference, we propose some shared frailty models and estimating equations are developed for estimation of regression parameters. The proposed estimates are consistent and have asymptotically a normal distribution. The finite sample properties of the proposed estimates are investigated through simulation and an illustrative example from a cancer study is provided.
Assuntos
Análise de Regressão , Biometria , Interpretação Estatística de Dados , Humanos , Estudos Longitudinais , Modelos Estatísticos , Recidiva , Fatores de Tempo , Neoplasias da Bexiga Urinária/tratamento farmacológicoRESUMO
Variable selection is an important issue in all regression analysis and in this paper, we discuss this in the context of regression analysis of recurrent event data. Recurrent event data often occur in long-term studies in which individuals may experience the events of interest more than once and their analysis has recently attracted a great deal of attention (Andersen et al., Statistical models based on counting processes, 1993; Cook and Lawless, Biometrics 52:1311-1323, 1996, The analysis of recurrent event data, 2007; Cook et al., Biometrics 52:557-571, 1996; Lawless and Nadeau, Technometrics 37:158-168, 1995; Lin et al., J R Stat Soc B 69:711-730, 2000). However, it seems that there are no established approaches to the variable selection with respect to recurrent event data. For the problem, we adopt the idea behind the nonconcave penalized likelihood approach proposed in Fan and Li (J Am Stat Assoc 96:1348-1360, 2001) and develop a nonconcave penalized estimating function approach. The proposed approach selects variables and estimates regression coefficients simultaneously and an algorithm is presented for this process. We show that the proposed approach performs as well as the oracle procedure in that it yields the estimates as if the correct submodel was known. Simulation studies are conducted for assessing the performance of the proposed approach and suggest that it works well for practical situations. The proposed methodology is illustrated by using the data from a chronic granulomatous disease study.