Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 80
Filtrar
1.
Stat Med ; 43(1): 156-172, 2024 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-37919834

RESUMO

A basket trial aims to expedite the drug development process by evaluating a new therapy in multiple populations within the same clinical trial. Each population, referred to as a "basket", can be defined by disease type, biomarkers, or other patient characteristics. The objective of a basket trial is to identify the subset of baskets for which the new therapy shows promise. The conventional approach would be to analyze each of the baskets independently. Alternatively, several Bayesian dynamic borrowing methods have been proposed that share data across baskets when responses appear similar. These methods can achieve higher power than independent testing in exchange for a risk of some inflation in the type 1 error rate. In this paper we propose a frequentist approach to dynamic borrowing for basket trials using adaptive lasso. Through simulation studies we demonstrate adaptive lasso can achieve similar power and type 1 error to the existing Bayesian methods. The proposed approach has the benefit of being easier to implement and faster than existing methods. In addition, the adaptive lasso approach is very flexible: it can be extended to basket trials with any number of treatment arms and any type of endpoint.


Assuntos
Projetos de Pesquisa , Humanos , Teorema de Bayes , Simulação por Computador
2.
Stat Med ; 43(21): 4131-4147, 2024 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-39007408

RESUMO

In this work, we propose methods to examine how the complex interrelationships between clinical symptoms and, separately, brain imaging biomarkers change over time leading up to the diagnosis of a disease in subjects with a known genetic near-certainty of disease. We propose a time-dependent undirected graphical model that ensures temporal and structural smoothness across time-specific networks to examine the trajectories of interactions between markers aligned at the time of disease onset. Specifically, we anchor subjects relative to the time of disease diagnosis (anchoring time) as in a revival process, and we estimate networks at each time point of interest relative to the anchoring time. To use all available data, we apply kernel weights to borrow information across observations that are close to the time of interest. Adaptive lasso weights are introduced to encourage temporal smoothness in edge strength, while a novel elastic fused- l 0 $$ {l}_0 $$ penalty removes spurious edges and encourages temporal smoothness in network structure. Our approach can handle practical complications such as unbalanced visit times. We conduct simulation studies to compare our approach with existing methods. We then apply our method to data from PREDICT-HD, a large prospective observational study of pre-manifest Huntington's disease (HD) patients, to identify symptom and imaging network changes that precede clinical diagnosis of HD.


Assuntos
Simulação por Computador , Doença de Huntington , Modelos Estatísticos , Neuroimagem , Humanos , Neuroimagem/métodos , Doença de Huntington/diagnóstico por imagem , Fatores de Tempo , Estudos Prospectivos , Encéfalo/diagnóstico por imagem , Biomarcadores
3.
Sensors (Basel) ; 24(12)2024 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-38931540

RESUMO

A motor imagery brain-computer interface connects the human brain and computers via electroencephalography (EEG). However, individual differences in the frequency ranges of brain activity during motor imagery tasks pose a challenge, limiting the manual feature extraction for motor imagery classification. To extract features that match specific subjects, we proposed a novel motor imagery classification model using distinctive feature fusion with adaptive structural LASSO. Specifically, we extracted spatial domain features from overlapping and multi-scale sub-bands of EEG signals and mined discriminative features by fusing the task relevance of features with spatial information into the adaptive LASSO-based feature selection. We evaluated the proposed model on public motor imagery EEG datasets, demonstrating that the model has excellent performance. Meanwhile, ablation studies and feature selection visualization of the proposed model further verified the great potential of EEG analysis.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Processamento de Sinais Assistido por Computador , Eletroencefalografia/métodos , Humanos , Algoritmos , Encéfalo/fisiologia , Encéfalo/diagnóstico por imagem , Imaginação/fisiologia
4.
Lifetime Data Anal ; 30(2): 472-500, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38436831

RESUMO

In clinical studies, one often encounters time-to-event data that are subject to right censoring and for which a fraction of the patients under study never experience the event of interest. Such data can be modeled using cure models in survival analysis. In the presence of cure fraction, the mixture cure model is popular, since it allows to model probability to be cured (called the incidence) and the survival function of the uncured individuals (called the latency). In this paper, we develop a variable selection procedure for the incidence and latency parts of a mixture cure model, consisting of a logistic model for the incidence and a semiparametric accelerated failure time model for the latency. We use a penalized likelihood approach, based on adaptive LASSO penalties for each part of the model, and we consider two algorithms for optimizing the criterion function. Extensive simulations are carried out to assess the accuracy of the proposed selection procedure. Finally, we employ the proposed method to a real dataset regarding heart failure patients with left ventricular systolic dysfunction.


Assuntos
Algoritmos , Modelos Estatísticos , Humanos , Funções Verossimilhança , Análise de Sobrevida , Modelos Logísticos , Simulação por Computador
5.
Stat Med ; 42(29): 5491-5512, 2023 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-37816678

RESUMO

Joint models for longitudinal and survival data (JMLSs) are widely used to investigate the relationship between longitudinal and survival data in clinical trials in recent years. But, the existing studies mainly focus on independent survival data. In many clinical trials, survival data may be bivariately correlated. To this end, this paper proposes a novel JMLS accommodating multivariate longitudinal and bivariate correlated time-to-event data. Nonparametric marginal survival hazard functions are transformed to bivariate normal random variables. Bayesian penalized splines are employed to approximate unknown baseline hazard functions. Incorporating the Metropolis-Hastings algorithm into the Gibbs sampler, we develop a Bayesian adaptive Lasso method to simultaneously estimate parameters and baseline hazard functions, and select important predictors in the considered JMLS. Simulation studies and an example taken from the International Breast Cancer Study Group are used to illustrate the proposed methodologies.


Assuntos
Algoritmos , Modelos Estatísticos , Humanos , Teorema de Bayes , Análise Multivariada , Simulação por Computador
6.
Stat Med ; 2023 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-36599810

RESUMO

There has been a growing interest in incorporating auxiliary summary information from external studies into the analysis of internal individual-level data. In this paper, we propose an adaptive estimation procedure for an additive risk model to integrate auxiliary subgroup survival information via a penalized method of moments technique. Our approach can accommodate information from heterogeneous data. Parameters to quantify the magnitude of potential incomparability between internal data and external auxiliary information are introduced in our framework while nonzero components of these parameters suggest a violation of the homogeneity assumption. We further develop an efficient computational algorithm to solve the numerical optimization problem by profiling out the nuisance parameters. In an asymptotic sense, our method can be as efficient as if all the incomparable auxiliary information is accurately acknowledged and has been automatically excluded from consideration. The asymptotic normality of the proposed estimator of the regression coefficients is established, with an explicit formula for the asymptotic variance-covariance matrix that can be consistently estimated from the data. Simulation studies show that the proposed method yields a substantial gain in statistical efficiency over the conventional method using the internal data only, and reduces estimation biases when the given auxiliary survival information is incomparable. We illustrate the proposed method with a lung cancer survival study.

7.
Stat Sin ; 33(2): 633-662, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37197479

RESUMO

Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.

8.
Biom J ; 65(1): e2100139, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-35837982

RESUMO

Recent technological advances have made it possible to collect high-dimensional genomic data along with clinical data on a large number of subjects. In the studies of chronic diseases such as cancer, it is of great interest to integrate clinical and genomic data to build a comprehensive understanding of the disease mechanisms. Despite extensive studies on integrative analysis, it remains an ongoing challenge to model the interaction effects between clinical and genomic variables, due to high dimensionality of the data and heterogeneity across data types. In this paper, we propose an integrative approach that models interaction effects using a single-index varying-coefficient model, where the effects of genomic features can be modified by clinical variables. We propose a penalized approach for separate selection of main and interaction effects. Notably, the proposed methods can be applied to right-censored survival outcomes based on a Cox proportional hazards model. We demonstrate the advantages of the proposed methods through extensive simulation studies and provide applications to a motivating cancer genomic study.


Assuntos
Genômica , Neoplasias , Humanos , Modelos de Riscos Proporcionais , Simulação por Computador , Neoplasias/genética
9.
Biom J ; 65(5): e2200047, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36960476

RESUMO

Cross-validation is the standard method for hyperparameter tuning, or calibration, of machine learning algorithms. The adaptive lasso is a popular class of penalized approaches based on weighted L1 -norm penalties, with weights derived from an initial estimate of the model parameter. Although it violates the paramount principle of cross-validation, according to which no information from the hold-out test set should be used when constructing the model on the training set, a "naive" cross-validation scheme is often implemented for the calibration of the adaptive lasso. The unsuitability of this naive cross-validation scheme in this context has not been well documented in the literature. In this work, we recall why the naive scheme is theoretically unsuitable and how proper cross-validation should be implemented in this particular context. Using both synthetic and real-world examples and considering several versions of the adaptive lasso, we illustrate the flaws of the naive scheme in practice. In particular, we show that it can lead to the selection of adaptive lasso estimates that perform substantially worse than those selected via a proper scheme in terms of both support recovery and prediction error. In other words, our results show that the theoretical unsuitability of the naive scheme translates into suboptimality in practice, and call for abandoning it.


Assuntos
Algoritmos , Projetos de Pesquisa , Calibragem
10.
Biom J ; 65(7): e2100406, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37189217

RESUMO

There has been growing interest in leveraging external control data to augment a randomized control group data in clinical trials and enable more informative decision making. In recent years, the quality and availability of real-world data have improved steadily as external controls. However, information borrowing by directly pooling such external controls with randomized controls may lead to biased estimates of the treatment effect. Dynamic borrowing methods under the Bayesian framework have been proposed to better control the false positive error. However, the numerical computation and, especially, parameter tuning, of those Bayesian dynamic borrowing methods remain a challenge in practice. In this paper, we present a frequentist interpretation of a Bayesian commensurate prior borrowing approach and describe intrinsic challenges associated with this method from the perspective of optimization. Motivated by this observation, we propose a new dynamic borrowing approach using adaptive lasso. The treatment effect estimate derived from this method follows a known asymptotic distribution, which can be used to construct confidence intervals and conduct hypothesis tests. The finite sample performance of the method is evaluated through extensive Monte Carlo simulations under different settings. We observed highly competitive performance of adaptive lasso compared to Bayesian approaches. Methods for selecting tuning parameters are also thoroughly discussed based on results from numerical studies and an illustration example.


Assuntos
Modelos Estatísticos , Projetos de Pesquisa , Teorema de Bayes , Método de Monte Carlo
11.
Stat Med ; 41(12): 2132-2165, 2022 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-35172378

RESUMO

Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that can increase robustness, by adding a layer of cross-validation (cross-validated targeted maximum likelihood estimation and double machine learning, as applied to substitution and estimating equation approaches, respectively). While these methods have been evaluated individually on simulated and experimental data sets, a comprehensive analysis of their performance across real data based simulations have yet to be conducted. In this work, we benchmark multiple widely used methods for estimation of the average treatment effect using ten different nutrition intervention studies data. A nonparametric regression method, undersmoothed highly adaptive lasso, is used to generate the simulated distribution which preserves important features from the observed data and reproduces a set of true target parameters. For each simulated data, we apply the methods above to estimate the average treatment effects as well as their standard errors and resulting confidence intervals. Based on the analytic results, a general recommendation is put forth for use of the cross-validated variants of both substitution and estimating equation estimators. We conclude that the additional layer of cross-validation helps in avoiding unintentional over-fitting of nuisance parameter functionals and leads to more robust inferences.


Assuntos
Aprendizado de Máquina , Projetos de Pesquisa , Causalidade , Simulação por Computador , Humanos , Funções Verossimilhança , Modelos Estatísticos , Análise de Regressão
12.
Stat Med ; 41(24): 4941-4960, 2022 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-35946065

RESUMO

The Fine-Gray proportional sub-distribution hazards (PSH) model is among the most popular regression model for competing risks time-to-event data. This article develops a fast safe feature elimination method, named PSH-SAFE, for fitting the penalized Fine-Gray PSH model with a Lasso (or adaptive Lasso) penalty. Our PSH-SAFE procedure is straightforward to implement, fast, and scales well to ultrahigh dimensional data. We also show that as a feature screening procedure, PSH-SAFE is safe in a sense that the eliminated features are guaranteed to be inactive features in the original Lasso (or adaptive Lasso) estimator for the penalized PSH model. We evaluate the performance of the PSH-SAFE procedure in terms of computational efficiency, screening efficiency and safety, run-time, and prediction accuracy on multiple simulated datasets and a real bladder cancer data. Our empirical results show that the PSH-SAFE procedure possesses desirable screening efficiency and safety properties and can offer substantially improved computational efficiency as well as similar or better prediction performance in comparison to their baseline competitors.


Assuntos
Neoplasias da Bexiga Urinária , Humanos , Programas de Rastreamento , Modelos de Riscos Proporcionais , Pesquisa , Neoplasias da Bexiga Urinária/diagnóstico
13.
Lifetime Data Anal ; 2022 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-36336732

RESUMO

Targeted maximum likelihood estimation (TMLE) provides a general methodology for estimation of causal parameters in presence of high-dimensional nuisance parameters. Generally, TMLE consists of a two-step procedure that combines data-adaptive nuisance parameter estimation with semiparametric efficiency and rigorous statistical inference obtained via a targeted update step. In this paper, we demonstrate the practical applicability of TMLE based causal inference in survival and competing risks settings where event times are not confined to take place on a discrete and finite grid. We focus on estimation of causal effects of time-fixed treatment decisions on survival and absolute risk probabilities, considering different univariate and multidimensional parameters. Besides providing a general guidance to using TMLE for survival and competing risks analysis, we further describe how the previous work can be extended with the use of loss-based cross-validated estimation, also known as super learning, of the conditional hazards. We illustrate the usage of the considered methods using publicly available data from a trial on adjuvant chemotherapy for colon cancer. R software code to implement all considered algorithms and to reproduce all analyses is available in an accompanying online appendix on Github.

14.
Brief Bioinform ; 20(5): 1913-1924, 2019 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-30032279

RESUMO

In the genetic system that regulates complex traits, metabolites, gene expression levels, RNA editing levels and DNA methylation, a series of small and linked genes exist. To date, however, little is known about how to design an efficient framework for the detection of these kinds of genes. In this article, we propose a genome-wide composite interval mapping (GCIM) in F2. First, controlling polygenic background via selecting markers in the genome scanning of linkage analysis was replaced by estimating polygenic variance in a genome-wide association study. This can control large, middle and minor polygenic backgrounds in genome scanning. Then, additive and dominant effects for each putative quantitative trait locus (QTL) were separately scanned so that a negative logarithm P-value curve against genome position could be separately obtained for each kind of effect. In each curve, all the peaks were identified as potential QTLs. Thus, almost all the small-effect and linked QTLs are included in a multi-locus model. Finally, adaptive least absolute shrinkage and selection operator (adaptive lasso) was used to estimate all the effects in the multi-locus model, and all the nonzero effects were further identified by likelihood ratio test for true QTL identification. This method was used to reanalyze four rice traits. Among 25 known genes detected in this study, 16 small-effect genes were identified only by GCIM. To further demonstrate GCIM, a series of Monte Carlo simulation experiments was performed. As a result, GCIM is demonstrated to be more powerful than the widely used methods for the detection of closely linked and small-effect QTLs.


Assuntos
Modelos Genéticos , Locos de Características Quantitativas , Metilação de DNA , Ligação Genética , Humanos , Método de Monte Carlo
15.
Stat Med ; 40(13): 3181-3195, 2021 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-33819928

RESUMO

In cancer studies, it is important to understand disease heterogeneity among patients so that precision medicine can particularly target high-risk patients at the right time. Many feature variables such as demographic variables and biomarkers, combined with a patient's survival outcome, can be used to infer such latent heterogeneity. In this work, we propose a mixture model to model each patient's latent survival pattern, where the mixing probabilities for latent groups are modeled through a multinomial distribution. The Bayesian information criterion is used for selecting the number of latent groups. Furthermore, we incorporate variable selection with the adaptive lasso into inference so that only a few feature variables will be selected to characterize the latent heterogeneity. We show that our adaptive lasso estimator has oracle properties when the number of parameters diverges with the sample size. The finite sample performance is evaluated by the simulation study, and the proposed method is illustrated by two datasets.


Assuntos
Medicina de Precisão , Teorema de Bayes , Biomarcadores , Simulação por Computador , Humanos , Probabilidade
16.
Stat Med ; 40(15): 3604-3624, 2021 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-33851463

RESUMO

Alzheimer's disease can be diagnosed by analyzing brain images (eg, magnetic resonance imaging, MRI) and neuropsychological tests (eg, mini-mental state examination, MMSE). A partially linear mean shift model (PLMSM) is here proposed to investigate the relationship between MMSE score and high-dimensional regions of interest in MRI, and detect the outliers. In the presence of high-dimensional data, existing Bayesian approaches (eg, Markov chain Monte Carlo) to analyze a PLMSM take intensive computational cost and require huge memory, and have low convergence rate. To address these issues, a variational Bayesian inference is developed to simultaneously estimate parameters and nonparametric functions and identify outliers in a PLMSM. A Bayesian P-splines method is presented to approximate nonparametric functions, a Bayesian adaptive Lasso approach is employed to select predictors, and outliers are detected by the classification variable. Two simulation studies are conducted to assess the finite sample performance of the proposed method. An MRI dataset with elderly cognitive ability is provided to corroborate the proposed method.


Assuntos
Doença de Alzheimer , Idoso , Algoritmos , Teorema de Bayes , Humanos , Modelos Lineares , Método de Monte Carlo , Neuroimagem
17.
Stat Med ; 39(9): 1311-1327, 2020 04 30.
Artigo em Inglês | MEDLINE | ID: mdl-31985088

RESUMO

Linear mixed models (LMMs) and their extensions have been widely used for high-dimensional genomic data analyses. While LMMs hold great promise for risk prediction research, the high dimensionality of the data and different effect sizes of genomic regions bring great analytical and computational challenges. In this work, we present a multikernel linear mixed model with adaptive lasso (KLMM-AL) to predict phenotypes using high-dimensional genomic data. We develop two algorithms for estimating parameters from our model and also establish the asymptotic properties of LMM with adaptive lasso when only one dependent observation is available. The proposed KLMM-AL can account for heterogeneous effect sizes from different genomic regions, capture both additive and nonadditive genetic effects, and adaptively and efficiently select predictive genomic regions and their corresponding effects. Through simulation studies, we demonstrate that KLMM-AL outperforms most of existing methods. Moreover, KLMM-AL achieves high sensitivity and specificity of selecting predictive genomic regions. KLMM-AL is further illustrated by an application to the sequencing dataset obtained from the Alzheimer's disease neuroimaging initiative.


Assuntos
Algoritmos , Genômica , Simulação por Computador , Modelos Lineares , Fenótipo
18.
Stat Med ; 39(2): 156-170, 2020 01 30.
Artigo em Inglês | MEDLINE | ID: mdl-31758598

RESUMO

We propose time-varying coefficient model selection and estimation based on the spline approach, which is capable of capturing time-dependent covariate effects. The new penalty function utilizes local-region information for varying-coefficient estimation, in contrast to the traditional model selection approach focusing on the entire region. The proposed method is extremely useful when the signals associated with relevant predictors are time-dependent, and detecting relevant covariate effects in the local region is more scientifically relevant than those of the entire region. Our simulation studies indicate that the proposed model selection incorporating local features outperforms the global feature model selection approaches. The proposed method is also illustrated through a longitudinal growth and health study from National Heart, Lung, and Blood Institute.


Assuntos
Estudos Longitudinais , Análise de Regressão , Algoritmos , Simulação por Computador , Humanos , Tempo
19.
Stat Med ; 2020 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-32101332

RESUMO

This study develops a two-part hidden Markov model (HMM) for analyzing semicontinuous longitudinal data in the presence of missing covariates. The proposed model manages a semicontinuous variable by splitting it into two random variables: a binary indicator for determining the occurrence of excess zeros at all occasions and a continuous random variable for examining its actual level. For the continuous longitudinal response, an HMM is proposed to describe the relationship between the observation and unobservable finite-state transition processes. The HMM consists of two major components. The first component is a transition model for investigating how potential covariates influence the probabilities of transitioning from one hidden state to another. The second component is a conditional regression model for examining the state-specific effects of covariates on the response. A shared random effect is introduced to each part of the model to accommodate possible unobservable heterogeneity among observation processes and the nonignorability of missing covariates. A Bayesian adaptive least absolute shrinkage and selection operator (lasso) procedure is developed to conduct simultaneous variable selection and estimation. The proposed methodology is applied to a study on the Alzheimer's Disease Neuroimaging Initiative dataset. New insights into the pathology of Alzheimer's disease and its potential risk factors are obtained.

20.
Multivariate Behav Res ; 55(6): 811-824, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31682150

RESUMO

The nominal response model is an item response theory model that does not require the ordering of the response options. However, while providing a very flexible modeling approach of polytomous responses, it involves the estimation of many parameters at the risk of numerical instability and overfitting. The lasso is a technique widely used to achieve model selection and regularization. In this paper, we propose the use of a fused lasso penalty to group response categories and perform regularization of the unidimensional and multidimensional nominal response models. The good performance of the method is illustrated through real-data applications and simulation studies.


Assuntos
Adaptação Psicológica/fisiologia , Simulação por Computador/estatística & dados numéricos , Tempo de Reação/fisiologia , Inquéritos e Questionários/estatística & dados numéricos , Algoritmos , Humanos , Modelos Estatísticos , Modelos Teóricos , Análise de Escalonamento Multidimensional , Psicometria , Projetos de Pesquisa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA