RESUMO
Polygenic score (PGS) is an important tool for the genetic prediction of complex traits. However, there are currently no resources providing comprehensive PGSs computed from published summary statistics, and it is difficult to implement and run different PGS methods due to the complexity of their pipelines and parameter settings. To address these issues, we introduce a new resource called PGS-Depot containing the most comprehensive set of publicly available disease-related GWAS summary statistics. PGS-Depot includes 5585 high quality summary statistics (1933 quantitative and 3652 binary trait statistics) curated from 1564 traits in European and East Asian populations. A standardized best-practice pipeline is used to implement 11 summary statistics-based PGS methods, each with different model assumptions and estimation procedures. The prediction performance of each method can be compared for both in- and cross-ancestry populations, and users can also submit their own summary statistics to obtain custom PGS with the available methods. Other features include searching for PGSs by trait name, publication, cohort information, population, or the MeSH ontology tree and searching for trait descriptions with the experimental factor ontology (EFO). All scores, SNP effect sizes and summary statistics can be downloaded via FTP. PGS-Depot is freely available at http://www.pgsdepot.net.
Assuntos
Bioestatística , Herança Multifatorial , Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Bioestatística/métodosRESUMO
The study of treatment effects is often complicated by noncompliance and missing data. In the one-sided noncompliance setting where of interest are the complier and noncomplier average causal effects, we address outcome missingness of the latent missing at random type (LMAR, also known as latent ignorability). That is, conditional on covariates and treatment assigned, the missingness may depend on compliance type. Within the instrumental variable (IV) approach to noncompliance, methods have been proposed for handling LMAR outcome that additionally invoke an exclusion restriction-type assumption on missingness, but no solution has been proposed for when a non-IV approach is used. This article focuses on effect identification in the presence of LMAR outcomes, with a view to flexibly accommodate different principal identification approaches. We show that under treatment assignment ignorability and LMAR only, effect nonidentifiability boils down to a set of two connected mixture equations involving unidentified stratum-specific response probabilities and outcome means. This clarifies that (except for a special case) effect identification generally requires two additional assumptions: a specific missingness mechanism assumption and a principal identification assumption. This provides a template for identifying effects based on separate choices of these assumptions. We consider a range of specific missingness assumptions, including those that have appeared in the literature and some new ones. Incidentally, we find an issue in the existing assumptions, and propose a modification of the assumptions to avoid the issue. Results under different assumptions are illustrated using data from the Baltimore Experience Corps Trial.
Assuntos
Modelos Estatísticos , Humanos , Interpretação Estatística de Dados , Causalidade , Bioestatística/métodosRESUMO
Determining causes of deaths (CODs) occurred outside of civil registration and vital statistics systems is challenging. A technique called verbal autopsy (VA) is widely adopted to gather information on deaths in practice. A VA consists of interviewing relatives of a deceased person about symptoms of the deceased in the period leading to the death, often resulting in multivariate binary responses. While statistical methods have been devised for estimating the cause-specific mortality fractions (CSMFs) for a study population, continued expansion of VA to new populations (or "domains") necessitates approaches that recognize between-domain differences while capitalizing on potential similarities. In this article, we propose such a domain-adaptive method that integrates external between-domain similarity information encoded by a prespecified rooted weighted tree. Given a cause, we use latent class models to characterize the conditional distributions of the responses that may vary by domain. We specify a logistic stick-breaking Gaussian diffusion process prior along the tree for class mixing weights with node-specific spike-and-slab priors to pool information between the domains in a data-driven way. The posterior inference is conducted via a scalable variational Bayes algorithm. Simulation studies show that the domain adaptation enabled by the proposed method improves CSMF estimation and individual COD assignment. We also illustrate and evaluate the method using a validation dataset. The article concludes with a discussion of limitations and future directions.
Assuntos
Autopsia , Teorema de Bayes , Causas de Morte , Humanos , Autopsia/métodos , Modelos Estatísticos , Bioestatística/métodosRESUMO
DNA methylation is an important epigenetic mark that modulates gene expression through the inhibition of transcriptional proteins binding to DNA. As in many other omics experiments, the issue of missing values is an important one, and appropriate imputation techniques are important in avoiding an unnecessary sample size reduction as well as to optimally leverage the information collected. We consider the case where relatively few samples are processed via an expensive high-density whole genome bisulfite sequencing (WGBS) strategy and a larger number of samples is processed using more affordable low-density, array-based technologies. In such cases, one can impute the low-coverage (array-based) methylation data using the high-density information provided by the WGBS samples. In this paper, we propose an efficient Linear Model of Coregionalisation with informative Covariates (LMCC) to predict missing values based on observed values and covariates. Our model assumes that at each site, the methylation vector of all samples is linked to the set of fixed factors (covariates) and a set of latent factors. Furthermore, we exploit the functional nature of the data and the spatial correlation across sites by assuming some Gaussian processes on the fixed and latent coefficient vectors, respectively. Our simulations show that the use of covariates can significantly improve the accuracy of imputed values, especially in cases where missing data contain some relevant information about the explanatory variable. We also showed that our proposed model is particularly efficient when the number of columns is much greater than the number of rows-which is usually the case in methylation data analysis. Finally, we apply and compare our proposed method with alternative approaches on two real methylation datasets, showing how covariates such as cell type, tissue type or age can enhance the accuracy of imputed values.
Assuntos
Metilação de DNA , Epigênese Genética , Metilação de DNA/genética , Humanos , Modelos Estatísticos , Epigenômica/métodos , Bioestatística/métodosRESUMO
There is an increasing interest in the use of joint models for the analysis of longitudinal and survival data. While random effects models have been extensively studied, these models can be hard to implement and the fixed effect regression parameters must be interpreted conditional on the random effects. Copulas provide a useful alternative framework for joint modeling. One advantage of using copulas is that practitioners can directly specify marginal models for the outcomes of interest. We develop a joint model using a Gaussian copula to characterize the association between multivariate longitudinal and survival outcomes. Rather than using an unstructured correlation matrix in the copula model to characterize dependence structure as is common, we propose a novel decomposition that allows practitioners to impose structure (e.g., auto-regressive) which provides efficiency gains in small to moderate sample sizes and reduces computational complexity. We develop a Markov chain Monte Carlo model fitting procedure for estimation. We illustrate the method's value using a simulation study and present a real data analysis of longitudinal quality of life and disease-free survival data from an International Breast Cancer Study Group trial.
Assuntos
Teorema de Bayes , Modelos Estatísticos , Humanos , Estudos Longitudinais , Análise de Sobrevida , Cadeias de Markov , Neoplasias da Mama/mortalidade , Método de Monte Carlo , Distribuição Normal , Feminino , Interpretação Estatística de Dados , Bioestatística/métodosRESUMO
With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging data play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate high-density longitudinal data and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this article, we propose a group-based method to cluster a collection of multivariate high-density longitudinal data via a Bayesian mixture of smoothing splines. Our method assumes each multivariate high-density longitudinal trajectory is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy, which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.
Assuntos
Teorema de Bayes , Humanos , Estudos Longitudinais , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Interpretação Estatística de Dados , Modelos Estatísticos , Lactente , Análise Multivariada , Bioestatística/métodosRESUMO
Modern longitudinal studies collect multiple outcomes as the primary endpoints to understand the complex dynamics of the diseases. Oftentimes, especially in clinical trials, the joint variation among the multidimensional responses plays a significant role in assessing the differential characteristics between two or more groups, rather than drawing inferences based on a single outcome. We develop a projection-based two-sample significance test to identify the population-level difference between the multivariate profiles observed under a sparse longitudinal design. The methodology is built upon widely adopted multivariate functional principal component analysis to reduce the dimension of the infinite-dimensional multi-modal functions while preserving the dynamic correlation between the components. The test applies to a wide class of (non-stationary) covariance structures of the response, and it detects a significant group difference based on a single p-value, thereby overcoming the issue of adjusting for multiple p-values that arise due to comparing the means in each of components separately. Finite-sample numerical studies demonstrate that the test maintains the type-I error, and is powerful to detect significant group differences, compared to the state-of-the-art testing procedures. The test is carried out on two significant longitudinal studies for Alzheimer's disease and Parkinson's disease (PD) patients, namely, TOMMORROW study of individuals at high risk of mild cognitive impairment to detect differences in the cognitive test scores between the pioglitazone and the placebo groups, and Azillect study to assess the efficacy of rasagiline as a potential treatment to slow down the progression of PD.
Assuntos
Doença de Parkinson , Humanos , Estudos Longitudinais , Doença de Parkinson/tratamento farmacológico , Doença de Parkinson/fisiopatologia , Doença de Alzheimer/tratamento farmacológico , Interpretação Estatística de Dados , Análise Multivariada , Bioestatística/métodos , Disfunção Cognitiva , Modelos Estatísticos , Pioglitazona/uso terapêutico , Pioglitazona/farmacologia , Análise de Componente PrincipalRESUMO
We present the motivation, experience, and learnings from a data challenge conducted at a large pharmaceutical corporation on the topic of subgroup identification. The data challenge aimed at exploring approaches to subgroup identification for future clinical trials. To mimic a realistic setting, participants had access to 4 Phase III clinical trials to derive a subgroup and predict its treatment effect on a future study not accessible to challenge participants. A total of 30 teams registered for the challenge with around 100 participants, primarily from Biostatistics organization. We outline the motivation for running the challenge, the challenge rules, and logistics. Finally, we present the results of the challenge, the participant feedback as well as the learnings. We also present our view on the implications of the results on exploratory analyses related to treatment effect heterogeneity.
Assuntos
Ensaios Clínicos Fase III como Assunto , Motivação , Humanos , Ensaios Clínicos Fase III como Assunto/métodos , Indústria Farmacêutica , Projetos de Pesquisa , Resultado do Tratamento , Bioestatística/métodos , Interpretação Estatística de DadosRESUMO
Rapid advances in high-throughput DNA sequencing technologies have enabled large-scale whole genome sequencing (WGS) studies. Before performing association analysis between phenotypes and genotypes, preprocessing and quality control (QC) of the raw sequence data need to be performed. Because many biostatisticians have not been working with WGS data so far, we first sketch Illumina's short-read sequencing technology. Second, we explain the general preprocessing pipeline for WGS studies. Third, we provide an overview of important QC metrics, which are applied to WGS data: on the raw data, after mapping and alignment, after variant calling, and after multisample variant calling. Fourth, we illustrate the QC with the data from the GENEtic SequencIng Study Hamburg-Davos (GENESIS-HD), a study involving more than 9000 human whole genomes. All samples were sequenced on an Illumina NovaSeq 6000 with an average coverage of 35× using a PCR-free protocol. For QC, one genome in a bottle (GIAB) trio was sequenced in four replicates, and one GIAB sample was successfully sequenced 70 times in different runs. Fifth, we provide empirical data on the compression of raw data using the DRAGEN original read archive (ORA). The most important quality metrics in the application were genetic similarity, sample cross-contamination, deviations from the expected Het/Hom ratio, relatedness, and coverage. The compression ratio of the raw files using DRAGEN ORA was 5.6:1, and compression time was linear by genome coverage. In summary, the preprocessing, joint calling, and QC of large WGS studies are feasible within a reasonable time, and efficient QC procedures are readily available.
Assuntos
Controle de Qualidade , Sequenciamento Completo do Genoma , Humanos , Biometria/métodos , Bioestatística/métodos , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
The goal of the Biostatistics Core of the Alzheimer's Disease Neuroimaging Initiative (ADNI) has been to ensure that sound study designs and statistical methods are used to meet the overall goals of ADNI. We have supported the creation of a well-validated and well-curated longitudinal database of clinical and biomarker information on ADNI participants and helped to make this accessible and usable for researchers. We have developed a statistical methodology for characterizing the trajectories of clinical and biomarker change for ADNI participants across the spectrum from cognitively normal to dementia, including multivariate patterns and evidence for heterogeneity in cognitive aging. We have applied these methods and adapted them to improve clinical trial design. ADNI-4 will offer us a chance to help extend these efforts to a more diverse cohort with an even richer panel of biomarker data to support better knowledge of and treatment for Alzheimer's disease and related dementias. HIGHLIGHTS: The Alzheimer's Disease Neuroimaging Initiative (ADNI) Biostatistics Core provides study design and analytic support to ADNI investigators. Core members develop and apply novel statistical methodology to work with ADNI data and support clinical trial design. The Core contributes to the standardization, validation, and harmonization of biomarker data. The Core serves as a resource to the wider research community to address questions related to the data and study as a whole.
Assuntos
Doença de Alzheimer , Bioestatística , Neuroimagem , Humanos , Doença de Alzheimer/diagnóstico por imagem , Neuroimagem/métodos , Bioestatística/métodos , Biomarcadores , Bases de Dados Factuais , Projetos de Pesquisa , Estudos Longitudinais , MasculinoRESUMO
A major task in the analysis of microbiome data is to identify microbes associated with differing biological conditions. Before conducting analysis, raw data must first be adjusted so that counts from different samples are comparable. A typical approach is to estimate normalization factors by which all counts in a sample are multiplied or divided. However, the inherent variation associated with estimation of normalization factors are often not accounted for in subsequent analysis, leading to a loss of precision. Rank normalization is a nonparametric alternative to the estimation of normalization factors in which each count for a microbial feature is replaced by its intrasample rank. Although rank normalization has been successfully applied to microarray analysis in the past, it has yet to be explored for microbiome data, which is characterized by high frequencies of 0s, strongly correlated features and compositionality. We propose to use rank normalization as an alternative to the estimation of normalization factors and examine its performance when paired with a two-sample t-test. On a rigorous 3rd-party benchmarking simulation, it is shown to offer strong control over the false discovery rate, and at sample sizes greater than 50 per treatment group, to offer an improvement in performance over commonly used normalization factors paired with t-tests, Wilcoxon rank-sum tests and methodologies implemented by R packages. On two real datasets, it yielded valid and reproducible results that were strongly in agreement with the original findings and the existing literature, further demonstrating its robustness and future potential. Availability: The data underlying this article are available online along with R code and supplementary materials at https://github.com/matthewlouisdavisBioStat/Rank-Normalization-Empowers-a-T-Test.
Assuntos
Bactérias/genética , Infecções Bacterianas/diagnóstico , Bioestatística/métodos , Neoplasias Colorretais/microbiologia , Doença de Crohn/microbiologia , Microbioma Gastrointestinal/genética , Metagenoma , Infecções Bacterianas/microbiologia , Benchmarking , Estudos de Casos e Controles , Criança , Estudos de Coortes , Simulação por Computador , Feminino , Humanos , Masculino , Computação Matemática , Metagenômica/métodos , RNA Ribossômico 16S/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Estatísticas não ParamétricasRESUMO
Statistical tests are the foundation on which data analysis for all types of clinical, basic and epidemiological research relies upon. Clinicians need to understand these tests and the basics of biostatistics to correctly relay the data and results from their research and to understand the results of other scientific publications. The first article in this three-part editorial series aims to present, in a clear way, some essential concepts of biostatistics that can assist the gastroenterologist in understanding scientific research, for an evidence-based clinical practice.
Assuntos
Gastroenterologistas , Humanos , Bioestatística/métodosRESUMO
The progression of disease for an individual can be described mathematically as a stochastic process. The individual experiences a failure event when the disease path first reaches or crosses a critical disease level. This happening defines a failure event and a first hitting time or time-to-event, both of which are important in medical contexts. When the context involves explanatory variables then there is usually an interest in incorporating regression structures into the analysis and the methodology known as threshold regression comes into play. To date, most applications of threshold regression have been based on parametric families of stochastic processes. This paper presents a semiparametric form of threshold regression that requires the stochastic process to have only one key property, namely, stationary independent increments. As this property is frequently encountered in real applications, this model has potential for use in many fields. The mathematical underpinnings of this semiparametric approach for estimation and prediction are described. The basic data element required by the model is a pair of readings representing the observed change in time and the observed change in disease level, arising from either a failure event or survival of the individual to the end of the data record. An extension is presented for applications where the underlying disease process is unobservable but component covariate processes are available to construct a surrogate disease process. Threshold regression, used in combination with a data technique called Markov decomposition, allows the methods to handle longitudinal time-to-event data by uncoupling a longitudinal record into a sequence of single records. Computational aspects of the methods are straightforward. An array of simulation experiments that verify computational feasibility and statistical inference are reported in an online supplement. Case applications based on longitudinal observational data from The Osteoarthritis Initiative (OAI) study are presented to demonstrate the methodology and its practical use.
Assuntos
Bioestatística , Modelos Estatísticos , Humanos , Processos Estocásticos , Simulação por Computador , Fatores de Tempo , Bioestatística/métodosRESUMO
The bootstrap, introduced in Efron (1979. Bootstrap methods: another look at the jackknife. The Annals of Statistics7, 1-26), is a landmark method for quantifying variability. It uses sampling with replacement with a sample size equal to that of the original data. We propose the upstrap, which samples with replacement either more or fewer samples than the original sample size. We illustrate the upstrap by solving a hard, but common, sample size calculation problem. The data and code used for the analysis in this article are available on GitHub (2018. https://github.com/ccrainic/upstrap).
Assuntos
Algoritmos , Bioestatística/métodos , Interpretação Estatística de Dados , Humanos , Análise de Regressão , Tamanho da AmostraRESUMO
Missing data are a common problem for both the construction and implementation of a prediction algorithm. Pattern submodels (PS)-a set of submodels for every missing data pattern that are fit using only data from that pattern-are a computationally efficient remedy for handling missing data at both stages. Here, we show that PS (i) retain their predictive accuracy even when the missing data mechanism is not missing at random (MAR) and (ii) yield an algorithm that is the most predictive among all standard missing data strategies. Specifically, we show that the expected loss of a forecasting algorithm is minimized when each pattern-specific loss is minimized. Simulations and a re-analysis of the SUPPORT study confirms that PS generally outperforms zero-imputation, mean-imputation, complete-case analysis, complete-case submodels, and even multiple imputation (MI). The degree of improvement is highly dependent on the missingness mechanism and the effect size of missing predictors. When the data are MAR, MI can yield comparable forecasting performance but generally requires a larger computational cost. We also show that predictions from the PS approach are equivalent to the limiting predictions for a MI procedure that is dependent on missingness indicators (the MIMI model). The focus of this article is on out-of-sample prediction; implications for model inference are only briefly explored.
Assuntos
Pesquisa Biomédica/métodos , Bioestatística/métodos , Interpretação Estatística de Dados , Modelos Estatísticos , HumanosRESUMO
With increasing availability of smartphones with Global Positioning System (GPS) capabilities, large-scale studies relating individual-level mobility patterns to a wide variety of patient-centered outcomes, from mood disorders to surgical recovery, are becoming a reality. Similar past studies have been small in scale and have provided wearable GPS devices to subjects. These devices typically collect mobility traces continuously without significant gaps in the data, and consequently the problem of data missingness has been safely ignored. Leveraging subjects' own smartphones makes it possible to scale up and extend the duration of these types of studies, but at the same time introduces a substantial challenge: to preserve a smartphone's battery, GPS can be active only for a small portion of the time, frequently less than $10\%$, leading to a tremendous missing data problem. We introduce a principled statistical approach, based on weighted resampling of the observed data, to impute the missing mobility traces, which we then summarize using different mobility measures. We compare the strengths of our approach to linear interpolation (LI), a popular approach for dealing with missing data, both analytically and through simulation of missingness for empirical data. We conclude that our imputation approach better mirrors human mobility both theoretically and over a sample of GPS mobility traces from 182 individuals in the Geolife data set, where, relative to LI, imputation resulted in a 10-fold reduction in the error averaged across all mobility features.
Assuntos
Bioestatística/métodos , Métodos Epidemiológicos , Sistemas de Informação Geográfica , Análise Espacial , Mapeamento Geográfico , HumanosRESUMO
Epidemiological studies on periodontal disease (PD) collect relevant bio-markers, such as the clinical attachment level (CAL) and the probed pocket depth (PPD), at pre-specified tooth sites clustered within a subject's mouth, along with various other demographic and biological risk factors. Routine cross-sectional evaluation are conducted under a linear mixed model (LMM) framework with underlying normality assumptions on the random terms. However, a careful investigation reveals considerable non-normality manifested in those random terms, in the form of skewness and tail behavior. In addition, PD progression is hypothesized to be spatially-referenced, i.e. disease status at proximal tooth-sites may be different from distally located sites, and tooth missingness is non-random (or informative), given that the number and location of missing teeth informs about the periodontal health in that region. To mitigate these complexities, we consider a matrix-variate skew-$t$ formulation of the LMM with a Markov graphical embedding to handle the site-level spatial associations of the bivariate (PPD and CAL) responses. Within the same framework, the non-randomly missing responses are imputed via a latent probit regression of the missingness indicator over the responses. Our hierarchical Bayesian framework powered by relevant Markov chain Monte Carlo steps addresses the aforementioned complexities within an unified paradigm, and estimates model parameters with seamless sharing of information across various stages of the hierarchy. Using both synthetic and real clinical data assessing PD status, we demonstrate a significantly improved fit of our proposition over various other alternative models.
Assuntos
Bioestatística/métodos , Modelos Estatísticos , Simulação por Computador , Humanos , Doenças Periodontais/epidemiologiaRESUMO
Microorganisms play critical roles in human health and disease. They live in diverse communities in which they interact synergistically or antagonistically. Thus for estimating microbial associations with clinical covariates, such as treatment effects, joint (multivariate) statistical models are preferred. Multivariate models allow one to estimate and exploit complex interdependencies among multiple taxa, yielding more powerful tests of exposure or treatment effects than application of taxon-specific univariate analyses. Analysis of microbial count data also requires special attention because data commonly exhibit zero inflation, i.e., more zeros than expected from a standard count distribution. To meet these needs, we developed a Bayesian variable selection model for multivariate count data with excess zeros that incorporates information on the covariance structure of the outcomes (counts for multiple taxa), while estimating associations with the mean levels of these outcomes. Though there has been much work on zero-inflated models for longitudinal data, little attention has been given to high-dimensional multivariate zero-inflated data modeled via a general correlation structure. Through simulation, we compared performance of the proposed method to that of existing univariate approaches, for both the binary ("excess zero") and count parts of the model. When outcomes were correlated the proposed variable selection method maintained type I error while boosting the ability to identify true associations in the binary component of the model. For the count part of the model, in some scenarios the univariate method had higher power than the multivariate approach. This higher power was at a cost of a highly inflated false discovery rate not observed with the proposed multivariate method. We applied the approach to oral microbiome data from the Pediatric HIV/AIDS Cohort Oral Health Study and identified five (of 44) species associated with HIV infection.
Assuntos
Bioestatística/métodos , Microbiota , Modelos Estatísticos , Teorema de Bayes , Infecções por HIV/microbiologia , Humanos , Saúde BucalRESUMO
Response adaptive randomized clinical trials have gained popularity due to their flexibility for adjusting design components, including arm allocation probabilities, at any point in the trial according to the intermediate results. In the Bayesian framework, allocation probabilities to different treatment arms are commonly defined as functionals of the posterior distributions of parameters of the outcome distribution for each treatment. In a non-conjugate model, however, repeated updates of the posterior distribution can be computationally intensive. In this article, we propose an adaptation of sequential Monte Carlo for efficiently updating the posterior distribution of parameters as new outcomes are observed in a general adaptive trial design. An efficient computational tool facilitates implementation of more flexible designs with more frequent interim looks that can in turn reduce the required sample size and expected number of failures in clinical trials. Moreover, more complex statistical models that reflect realistic modeling assumptions can be used for analysis of trial results.
Assuntos
Pesquisa Biomédica/métodos , Bioestatística/métodos , Modelos Estatísticos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Projetos de Pesquisa , Humanos , Método de Monte CarloRESUMO
Recently dynamic treatment regimens (DTRs) have drawn considerable attention, as an effective tool for personalizing medicine. Sequential Multiple Assignment Randomized Trials (SMARTs) are often used to gather data for making inference on DTRs. In this article, we focus on regression analysis of DTRs from a two-stage SMART for competing risk outcomes based on cumulative incidence functions (CIFs). Even though there are extensive works on the regression problem for DTRs, no research has been done on modeling the CIF for SMART trials. We extend existing CIF regression models to handle covariate effects for DTRs. Asymptotic properties are established for our proposed estimators. The models can be implemented using existing software by an augmented-data approximation. We show the improvement provided by our proposed methods by simulation and illustrate its practical utility through an analysis of a SMART neuroblastoma study, where disease progression cannot be observed after death.