RESUMEN
The progression of disease for an individual can be described mathematically as a stochastic process. The individual experiences a failure event when the disease path first reaches or crosses a critical disease level. This happening defines a failure event and a first hitting time or time-to-event, both of which are important in medical contexts. When the context involves explanatory variables then there is usually an interest in incorporating regression structures into the analysis and the methodology known as threshold regression comes into play. To date, most applications of threshold regression have been based on parametric families of stochastic processes. This paper presents a semiparametric form of threshold regression that requires the stochastic process to have only one key property, namely, stationary independent increments. As this property is frequently encountered in real applications, this model has potential for use in many fields. The mathematical underpinnings of this semiparametric approach for estimation and prediction are described. The basic data element required by the model is a pair of readings representing the observed change in time and the observed change in disease level, arising from either a failure event or survival of the individual to the end of the data record. An extension is presented for applications where the underlying disease process is unobservable but component covariate processes are available to construct a surrogate disease process. Threshold regression, used in combination with a data technique called Markov decomposition, allows the methods to handle longitudinal time-to-event data by uncoupling a longitudinal record into a sequence of single records. Computational aspects of the methods are straightforward. An array of simulation experiments that verify computational feasibility and statistical inference are reported in an online supplement. Case applications based on longitudinal observational data from The Osteoarthritis Initiative (OAI) study are presented to demonstrate the methodology and its practical use.
Asunto(s)
Bioestadística , Modelos Estadísticos , Humanos , Procesos Estocásticos , Simulación por Computador , Factores de Tiempo , Bioestadística/métodosRESUMEN
The Kaplan-Meier estimator is ubiquitously used to estimate survival probabilities for time-to-event data. It is nonparametric, and thus does not require specification of a survival distribution, but it does assume that the risk set at any time t consists of independent observations. This assumption does not hold for data from paired organ systems such as occur in ophthalmology (eyes) or otolaryngology (ears), or for other types of clustered data. In this article, we estimate marginal survival probabilities in the setting of clustered data, and provide confidence limits for these estimates with intra-cluster correlation accounted for by an interval-censored version of the Clayton-Oakes model. We develop a goodness-of-fit test for general bivariate interval-censored data and apply it to the proposed interval-censored version of the Clayton-Oakes model. We also propose a likelihood ratio test for the comparison of survival distributions between two groups in the setting of clustered data under the assumption of a constant between-group hazard ratio. This methodology can be used both for balanced and unbalanced cluster sizes, and also when the cluster size is informative. We compare our test to the ordinary log rank test and the Lin-Wei (LW) test based on the marginal Cox proportional Hazards model with robust standard errors obtained from the sandwich estimator. Simulation results indicate that the ordinary log rank test over-inflates type I error, while the proposed unconditional likelihood ratio test has appropriate type I error and higher power than the LW test. The method is demonstrated in real examples from the Sorbinil Retinopathy Trial, and the Age-Related Macular Degeneration Study. Raw data from these two trials are provided.
Asunto(s)
Retinopatía Diabética , Humanos , Modelos de Riesgos Proporcionales , Análisis de Supervivencia , Simulación por Computador , Funciones de VerosimilitudRESUMEN
Group sequential design (GSD) has become a popular choice in recent clinical trials as it improves trial efficiency by providing options for early termination. The implementation of traditional tests for survival analysis (eg, the log-rank test and the Cox proportional hazard (PH) model) in the GSD setting has been widely discussed. The PH assumption is required for conventional (sequential) design, it is, however, often violated in practice. As an alternative, some generalized tests have been proposed (eg, the Max-Combo test) and their efficacies have been established. In this article, we explore the application of a more flexible, "first hitting time" based threshold regression (TR) model to GSD. TR assumes that subjects' health status is a latent (unobservable) process, and the clinical event of interest occurs when the latent health process hits a pre-specified boundary. The simulation results supported our findings that, in most cases, this comparable new method can successfully control type I error while providing higher early stopping opportunities in the sequential design, even when non-proportional hazard presents.
Asunto(s)
Proyectos de Investigación , Simulación por Computador , Humanos , Modelos de Riesgos Proporcionales , Ensayos Clínicos Controlados Aleatorios como Asunto , Análisis de SupervivenciaRESUMEN
Individuals in many observational studies and clinical trials for chronic diseases are enrolled well after onset or diagnosis of their disease. Times to events of interest after enrollment are therefore residual or left-truncated event times. Individuals entering the studies have disease that has advanced to varying extents. Moreover, enrollment usually entails probability sampling of the study population. Finally, event times over a short to moderate time horizon are often of interest in these investigations, rather than more speculative and remote happenings that lie beyond the study period. This research report looks at the issue of delayed entry into these kinds of studies and trials. Time to event for an individual is modelled as a first hitting time of an event threshold by a latent disease process, which is taken to be a Wiener process. It is emphasized that recruitment into these studies often involves length-biased sampling. The requisite mathematics for this kind of sampling and delayed entry are presented, including explicit formulas needed for estimation and inference. Restricted mean survival time (RMST) is taken as the clinically relevant outcome measure. Exact parametric formulas for this measure are derived and presented. The results are extended to settings that involve study covariates using threshold regression methods. Methods adapted for clinical trials are presented. An extensive case illustration for a clinical trial setting is then presented to demonstrate the methods, the interpretation of results, and the harvesting of useful insights. The closing discussion covers a number of important issues and concepts.
Asunto(s)
Ensayos Clínicos como Asunto , Estudios Observacionales como Asunto , Tiempo de Tratamiento , Humanos , Probabilidad , Análisis de Regresión , Análisis de Supervivencia , Tasa de SupervivenciaRESUMEN
Receiver operating characteristic (ROC) curve is commonly used to evaluate and compare the accuracy of classification methods or markers. Estimating ROC curves has been an important problem in various fields including biometric recognition and diagnostic medicine. In real applications, classification markers are often developed under two or more ordered conditions, such that a natural stochastic ordering exists among the observations. Incorporating such a stochastic ordering into estimation can improve statistical efficiency (Davidov and Herman, 2012). In addition, clustered and correlated data arise when multiple measurements are gleaned from the same subject, making estimation of ROC curves complicated due to within-cluster correlations. In this article, we propose to model the ROC curve using a weighted empirical process to jointly account for the order constraint and within-cluster correlation structure. The algebraic properties of resulting summary statistics of the ROC curve such as its area and partial area are also studied. The algebraic expressions reduce to the ones by Davidov and Herman (2012) for independent observations. We derive asymptotic properties of the proposed order-restricted estimators and show that they have smaller mean-squared errors than the existing estimators. Simulation studies also demonstrate better performance of the newly proposed estimators over existing methods for finite samples. The proposed method is further exemplified with the fingerprint matching data from the National Institute of Standards and Technology Special Database 4.
Asunto(s)
Biometría , Modelos Estadísticos , Área Bajo la Curva , Biomarcadores , Simulación por Computador , Curva ROCRESUMEN
BACKGROUND: Since 1999, over 702,000 people in the US have died of a drug overdose, and the drug overdose death rate has increased from 6.2 to 21.8 per 100,000. Employment status and occupation may be important social determinants of overdose deaths. OBJECTIVES: Estimate the risk of drug overdose death by employment status and occupation, controlling for other social and demographic factors known to be associated with overdose deaths. METHODS: Proportional hazard models were used to study US adults in the National Longitudinal Mortality Study with baseline measurements taken in the early 2000s and up to 6 years of follow-up (n = 438,739, 53% female, 47% male). Comparisons were made between adults with different employment statuses (employed, unemployed, disabled, etc.) and occupations (sales, construction, service occupations, etc.). Models were adjusted for age, sex, race/ethnicity, education, income and marital status. RESULTS: Adults who were disabled (hazard ratio (HR) = 6.96 (95% CI = 6.81-7.12)), unemployed (HR = 4.20, 95% CI = 4.09-4.32) and retired (HR = 2.94, 95% CI = 2.87-3.00) were at higher risk of overdose death relative to those who were employed. By occupation, those working in service (HR = 2.05, 95% CI = 1.97-2.13); construction and extraction (HR = 1.69, 95% CI = 1.64-1.76); management, business and financial (HR = 1.39, 95% CI = 1.33-1.44); and installation, maintenance and repair (HR = 1.32, 95% CI = 1.25-1.40) occupations displayed higher risk relative to professional occupations. CONCLUSIONS: In a large national cohort followed prospectively for up to 6 years, several employment statuses and occupations are associated with overdose deaths, independent of a range of other factors. Efforts to prevent overdose deaths may benefit from focusing on these high-risk groups.
Asunto(s)
Sobredosis de Droga/mortalidad , Empleo/estadística & datos numéricos , Ocupaciones/estadística & datos numéricos , Adulto , Anciano , Causas de Muerte , Estudios de Cohortes , Etnicidad , Femenino , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Modelos de Riesgos Proporcionales , Factores de Riesgo , Estados Unidos/epidemiologíaRESUMEN
Standard methods for two-sample tests such as the t-test and Wilcoxon rank sum test may lead to incorrect type I errors when applied to longitudinal or clustered data. Recent alternatives of two-sample tests for clustered data often require certain assumptions on the correlation structure and/or noninformative cluster size. In this paper, based on a novel pseudolikelihood for correlated data, we propose a score test without knowledge of the correlation structure or assuming data missingness at random. The proposed score test can capture differences in the mean and variance between two groups simultaneously. We use projection theory to derive the limiting distribution of the test statistic, in which the covariance matrix can be empirically estimated. We conduct simulation studies to evaluate the proposed test and compare it with existing methods. To illustrate the usefulness proposed test, we use it to compare self-reported weight loss data in a friends' referral group, with the data from the Internet self-joining group.
Asunto(s)
Biometría/métodos , Autoinforme , Pérdida de Peso , Análisis por Conglomerados , Simulación por Computador , Humanos , Internet , Estudios LongitudinalesRESUMEN
The Cox proportional hazards (PH) model is a common statistical technique used for analyzing time-to-event data. The assumption of PH, however, is not always appropriate in real applications. In cases where the assumption is not tenable, threshold regression (TR) and other survival methods, which do not require the PH assumption, are available and widely used. These alternative methods generally assume that the study data constitute simple random samples. In particular, TR has not been studied in the setting of complex surveys that involve (1) differential selection probabilities of study subjects and (2) intracluster correlations induced by multistage cluster sampling. In this paper, we extend TR procedures to account for complex sampling designs. The pseudo-maximum likelihood estimation technique is applied to estimate the TR model parameters. Computationally efficient Taylor linearization variance estimators that consider both the intracluster correlation and the differential selection probabilities are developed. The proposed methods are evaluated by using simulation experiments with various complex designs and illustrated empirically by using mortality-linked Third National Health and Nutrition Examination Survey Phase II genetic data.
Asunto(s)
Funciones de Verosimilitud , Análisis de Regresión , Análisis por Conglomerados , Simulación por Computador , Bases de Datos Genéticas , Humanos , Mortalidad , Análisis Multivariante , Encuestas Nutricionales , Análisis de SupervivenciaRESUMEN
BACKGROUND: Campylobacter is a leading cause of foodborne illness in the United States. Campylobacter infections have been associated with individual risk factors, such as the consumption of poultry and raw milk. Recently, a Maryland-based study identified community socioeconomic and environmental factors that are also associated with campylobacteriosis rates. However, no previous studies have evaluated the association between community risk factors and campylobacteriosis rates across multiple U.S. states. METHODS: We obtained Campylobacter case data (2004-2010; n = 40,768) from the Foodborne Diseases Active Surveillance Network (FoodNet) and socioeconomic and environmental data from the 2010 Census of Population and Housing, the 2011 American Community Survey, and the 2007 U.S. Census of Agriculture. We linked data by zip code and derived incidence rate ratios using negative binomial regression models. RESULTS: Community socioeconomic and environmental factors were associated with both lower and higher campylobacteriosis rates. Zip codes with higher percentages of African Americans had lower rates of campylobacteriosis (incidence rate ratio [IRR]) = 0.972; 95 % confidence interval (CI) = 0.970,0.974). In Georgia, Maryland, and Tennessee, three leading broiler chicken producing states, zip codes with broiler operations had incidence rates that were 22 % (IRR = 1.22; 95 % CI = 1.03,1.43), 16 % (IRR = 1.16; 95 % CI = 0.99,1.37), and 35 % (IRR = 1.35; 95 % CI = 1.18,1.53) higher, respectively, than those of zip codes without broiler operations. In Minnesota and New York FoodNet counties, two top dairy producing areas, zip codes with dairy operations had significantly higher campylobacteriosis incidence rates (IRR = 1.37; 95 % CI = 1.22, 1.55; IRR = 1.19; 95 % CI = 1.04,1.36). CONCLUSIONS: Community socioeconomic and environmental factors are important to consider when evaluating the relationship between possible risk factors and Campylobacter infection.
Asunto(s)
Infecciones por Campylobacter/epidemiología , Enfermedades Transmitidas por los Alimentos/epidemiología , Productos Avícolas/envenenamiento , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Crianza de Animales Domésticos , Animales , Infecciones por Campylobacter/etiología , Pollos , Niño , Preescolar , Ambiente , Femenino , Enfermedades Transmitidas por los Alimentos/etiología , Encuestas Epidemiológicas , Humanos , Incidencia , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Modelos Estadísticos , Vigilancia en Salud Pública , Características de la Residencia , Factores de Riesgo , Factores Socioeconómicos , Estados Unidos/epidemiología , Adulto JovenRESUMEN
Osteoporotic hip fractures in the elderly are associated with a high mortality in the first year following fracture and a high incidence of disability among survivors. We study first and second fractures of elderly women using data from the Study of Osteoporotic Fractures. We present a new conceptual framework, stochastic model, and statistical methodology for time to fracture. Our approach gives additional insights into the patterns for first and second fractures and the concomitant risk factors. Our modeling perspective involves a novel time-to-event methodology called threshold regression, which is based on the plausible idea that many events occur when an underlying process describing the health or condition of a person or system encounters a critical boundary or threshold for the first time. In the parlance of stochastic processes, this time to event is a first hitting time of the threshold. The underlying process in our model is a composite of a chronic degradation process for skeletal health combined with a random stream of shocks from external traumas, which taken together trigger fracture events.
Asunto(s)
Fracturas de Cadera/etiología , Modelos Estadísticos , Osteoporosis/complicaciones , Anciano , Bioestadística/métodos , Progresión de la Enfermedad , Femenino , Fracturas de Cadera/epidemiología , Humanos , Modelos de Riesgos Proporcionales , Análisis de Regresión , Factores de Riesgo , Procesos Estocásticos , Análisis de Supervivencia , Factores de TiempoRESUMEN
Varying-coefficient models have claimed an increasing portion of statistical research and are now applied to censored data analysis in medical studies. We incorporate such flexible semiparametric regression tools for interval censored data with a cured proportion. We adopted a two-part model to describe the overall survival experience for such complicated data. To fit the unknown functional components in the model, we take the local polynomial approach with bandwidth chosen by cross-validation. We establish consistency and asymptotic distribution of the estimation and propose to use bootstrap for inference. We constructed a BIC-type model selection method to recommend an appropriate specification of parametric and nonparametric components in the model. We conducted extensive simulations to assess the performance of our methods. An application on a decompression sickness data illustrates our methods.
Asunto(s)
Interpretación Estadística de Datos , Modelos Estadísticos , Análisis de Supervivencia , Simulación por Computador , Enfermedad de Descompresión/fisiopatología , Embolia Aérea/fisiopatología , Femenino , Humanos , MasculinoRESUMEN
The area under the true ROC curve (AUC) is routinely used to determine how strongly a given model discriminates between the levels of a binary outcome. Standard inference with the AUC requires that outcomes be independent of each other. To overcome this limitation, a method was developed for the estimation of the variance of the AUC in the setting of two-level hierarchical data using probit-transformed prediction scores generated from generalized estimating equation models, thereby allowing for the application of inferential methods. This manuscript presents an extension of this approach so that inference for the AUC may be performed in a three-level hierarchical data setting (e.g., eyes nested within persons and persons nested within families). A method that accounts for the effect of tied prediction scores on inference is also described. The performance of 95% confidence intervals around the AUC was assessed through the simulation of three-level clustered data in multiple settings, including ones with tied data and variable cluster sizes. Across all settings, the actual 95% confidence interval coverage varied from 0.943 to 0.958, and the ratio of the theoretical variance to the empirical variance of the AUC varied from 0.920 to 1.013. The results are better than those from existing methods. Two examples of applying the proposed methodology are presented.
RESUMEN
Birth weight and gestational age are important measures of a newborn's intrinsic health, serving both as outcome measures and explanatory variables in health studies. The measures are highly correlated but occasionally inconsistent. We anticipate that health researchers and other scientists would be helped by summary indexes of birth weight and gestational age that give more precise indications of whether the birth outcome is healthy or not. We propose a pair of indexes that we refer to as the birth normalcy index or BNI and birth discrepancy index or BDI. Both indexes are simple functions of birth weight and gestational age and in logarithmic form are orthogonal by construction. The BNI gauges whether the birth weight and gestational age combination are in a normal range. The BDI gauges whether birth weight and gestational age are consistent. We present a three-component mixture model for BNI, with the components representing premature, at-risk, and healthy births. The BNI distribution is derived from a stochastic model of fetal development proposed by Whitmore and Su (2007, Lifetime Data Analysis 13, 161-190) and takes the form of a mixture of inverse Gaussian distributions. We present a noncentral t-distribution as a model for BDI. BNI and BDI are also well suited for making comparisons of birth outcomes in different reference populations. A simple z-score and t-score are proposed for such comparisons. The BNI and BDI distributions can be estimated for births in any reference population of interest using threshold regression.
Asunto(s)
Biometría/métodos , Peso al Nacer , Interpretación Estadística de Datos , Edad Gestacional , Modelos Estadísticos , Análisis de Regresión , Simulación por Computador , Humanos , Valores de ReferenciaRESUMEN
Time-to-event data with time-varying covariates pose an interesting challenge for statistical modeling and inference, especially where the data require a regression structure but are not consistent with the proportional hazard assumption. Threshold regression (TR) is a relatively new methodology based on the concept that degradation or deterioration of a subject's health follows a stochastic process and failure occurs when the process first reaches a failure state or threshold (a first-hitting-time). Survival data with time-varying covariates consist of sequential observations on the level of degradation and/or on covariates of the subject, prior to the occurrence of the failure event. Encounters with this type of data structure abound in practical settings for survival analysis and there is a pressing need for simple regression methods to handle the longitudinal aspect of the data. Using a Markov property to decompose a longitudinal record into a series of single records is one strategy for dealing with this type of data. This study looks at the theoretical conditions for which this Markov approach is valid. The approach is called threshold regression with Markov decomposition or Markov TR for short. A number of important special cases, such as data with unevenly spaced time points and competing risks as stopping modes, are discussed. We show that a proportional hazards regression model with time-varying covariates is consistent with the Markov TR model. The Markov TR procedure is illustrated by a case application to a study of lung cancer risk. The procedure is also shown to be consistent with the use of an alternative time scale. Finally, we present the connection of the procedure to the concept of a collapsible survival model.
Asunto(s)
Bioestadística , Medición de Riesgo/métodos , Análisis de Supervivencia , Adulto , Femenino , Humanos , Estudios Longitudinales , Neoplasias Pulmonares/epidemiología , Neoplasias Pulmonares/mortalidad , Cadenas de Markov , Persona de Mediana Edad , Enfermeras y Enfermeros/estadística & datos numéricos , Modelos de Riesgos Proporcionales , Análisis de Regresión , Fumar , Factores de TiempoRESUMEN
Proportional hazards (PH) regression is a standard methodology for analyzing survival and time-to-event data. The proportional hazards assumption of PH regression, however, is not always appropriate. In addition, PH regression focuses mainly on hazard ratios and thus does not offer many insights into underlying determinants of survival. These limitations have led statistical researchers to explore alternative methodologies. Threshold regression (TR) is one of these alternative methodologies (see Lee and Whitmore, Stat Sci 21:501-513, 2006, for a review). The connection between PH regression and TR has been examined in previous published work but the investigations have been limited in scope. In this article, we study the connections between these two regression methodologies in greater depth and show that PH regression is, for most purposes, a special case of TR. We show two methods of construction by which TR models can yield PH functions for survival times, one based on altering the TR time scale and the other based on varying the TR boundary. We discuss how to estimate the TR time scale and boundary, with or without the PH assumption. A case demonstration is used to highlight the greater understanding of scientific foundations that TR can offer in comparison to PH regression. Finally, we discuss the potential benefits of positioning PH regression within the first-hitting-time context of TR regression.
Asunto(s)
Modelos de Riesgos Proporcionales , Análisis de Regresión , Análisis de Supervivencia , Humanos , Procesos Estocásticos , Tasa de SupervivenciaRESUMEN
The threshold regression model is an effective alternative to the Cox proportional hazards regression model when the proportional hazards assumption is not met. This paper considers variable selection for threshold regression. This model has separate regression functions for the initial health status and the speed of degradation in health. This flexibility is an important advantage when considering relevant risk factors for a complex time-to-event model where one needs to decide which variables should be included in the regression function for the initial health status, in the function for the speed of degradation in health, or in both functions. In this paper, we extend the broken adaptive ridge (BAR) method, originally designed for variable selection for one regression function, to simultaneous variable selection for both regression functions needed in the threshold regression model. We establish variable selection consistency of the proposed method and asymptotic normality of the estimator of non-zero regression coefficients. Simulation results show that our method outperformed threshold regression without variable selection and variable selection based on the Akaike information criterion. We apply the proposed method to data from an HIV drug adherence study in which electronic monitoring of drug intake is used to identify risk factors for non- adherence.
RESUMEN
Time-to-event analysis of sexually transmitted infection data is often complicated by the existence of nonproportional hazards and nonlinear independent variable effects. Methods without the proportional hazards assumption, such as threshold regression models, have been successfully used in many applications. This paper seeks to extend the existing threshold regression models to accommodate the nonlinear independent variable effects. Specifically, we incorporated penalized and regression splines to the threshold regression models for added modeling flexibility. Cross validation methods were used for the selection of the number of knots and for the determination of smoothing parameters. Variance estimates were proposed for inference purposes. Simulation results showed that the proposed methods were able to achieve nonparametric function and parametric coefficient estimates that are close to their true values. Simulation also demonstrated satisfactory performance of variance estimates. Using the proposed methods, we analyzed time from sexual debut to the first infection with Chlamydia trachomatis infection in a group of young women. Analysis shows that the lifetime number of sexual partners has a nonlinear effect on the risk of C. trachomatis infection and the infection risks were differential by ethnicity and age of sexual debut.
Asunto(s)
Modelos Estadísticos , Enfermedades de Transmisión Sexual/epidemiología , Adolescente , Factores de Edad , Algoritmos , Análisis de Varianza , Bioestadística/métodos , Infecciones por Chlamydia/epidemiología , Chlamydia trachomatis , Coito , Simulación por Computador , Femenino , Humanos , Estudios Longitudinales , Modelos de Riesgos Proporcionales , Grupos Raciales/estadística & datos numéricos , Análisis de Regresión , Factores de Riesgo , Parejas Sexuales , Estadísticas no Paramétricas , Factores de Tiempo , Población UrbanaRESUMEN
A case-control study of lung cancer mortality in U.S. railroad workers in jobs with and without diesel exhaust exposure is reanalyzed using a new threshold regression methodology. The study included 1256 workers who died of lung cancer and 2385 controls who died primarily of circulatory system diseases. Diesel exhaust exposure was assessed using railroad job history from the US Railroad Retirement Board and an industrial hygiene survey. Smoking habits were available from next-of-kin and potential asbestos exposure was assessed by job history review. The new analysis reassesses lung cancer mortality and examines circulatory system disease mortality. Jobs with regular exposure to diesel exhaust had a survival pattern characterized by an initial delay in mortality, followed by a rapid deterioration of health prior to death. The pattern is seen in subjects dying of lung cancer, circulatory system diseases, and other causes. The unique pattern is illustrated using a new type of Kaplan-Meier survival plot in which the time scale represents a measure of disease progression rather than calendar time. The disease progression scale accounts for a healthy-worker effect when describing the effects of cumulative exposures on mortality.
RESUMEN
The recent rise in opioid-related overdose deaths stresses the importance of understanding how heroin use disorders persist and what interventions are best suited for treating these illnesses. Trends show that there are diverse pathways leading to heroin use disorder that span multiple generations, but little is known about how different generations utilize and respond to treatment. This study provides insight into treatment utilization for young, middle-aged, and older adults by examination of an unusually rich longitudinal dataset of substance use disorder clients in Maryland who were treated for heroin use. Results show that clear patterns of treatment readmission emerge across generations in treatment-naïve clients with regard to gender, ethnicity, employment, geographical region, and treatment type/intensity. In particular, Millennials comprise the majority of the clients receiving heroin use disorder treatment and are the largest contributor to these readmission patterns. Millennials are also given opioid maintenance therapy (OMT) more frequently than other generations, while exhibiting a strong avoidance to treatment. Generational differences in treatment decisions and outcomes over the course of a treatment career are important for understanding the nature of the current opioid epidemic, and can play an important role in directing heroin use disorder treatment efforts and improving models of care.
Asunto(s)
Dependencia de Heroína/rehabilitación , Tratamiento de Sustitución de Opiáceos/estadística & datos numéricos , Readmisión del Paciente/estadística & datos numéricos , Adolescente , Adulto , Factores de Edad , Anciano , Estudios de Cohortes , Femenino , Dependencia de Heroína/epidemiología , Humanos , Masculino , Maryland/epidemiología , Persona de Mediana Edad , Adulto JovenRESUMEN
Massively parallel sequencing (a.k.a. next-generation sequencing, NGS) technology has emerged as a powerful tool in characterizing genomic profiles. Among many NGS applications, RNA sequencing (RNA-Seq) has gradually become a standard tool for global transcriptomic monitoring. Although the cost of NGS experiments has dropped constantly, the high sequencing cost and bioinformatic complexity are still obstacles for many biomedical projects. Unlike earlier fluorescence-based technologies such as microarray, modelling of NGS data should consider discrete count data. In addition to sample size, sequencing depth also directly relates to the experimental cost. Consequently, given total budget and pre-specified unit experimental cost, the study design issue in RNA-Seq is conceptually a more complex multi-dimensional constrained optimization problem rather than one-dimensional sample size calculation in traditional hypothesis setting. In this paper, we propose a statistical framework, namely "RNASeqDesign", to utilize pilot data for power calculation and study design of RNA-Seq experiments. The approach is based on mixture model fitting of p-value distribution from pilot data and a parametric bootstrap procedure based on approximated Wald test statistics to infer genome-wide power for optimal sample size and sequencing depth. We further illustrate five practical study design tasks for practitioners. We perform simulations and three real applications to evaluate the performance and compare to existing methods.