Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 68
Filtrar
1.
J Biopharm Stat ; : 1-14, 2023 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-37162278

RESUMO

A critical task in single-cell RNA sequencing (scRNA-Seq) data analysis is to identify cell types from heterogeneous tissues. While the majority of classification methods demonstrated high performance in scRNA-Seq annotation problems, a robust and accurate solution is desired to generate reliable outcomes for downstream analyses, for instance, marker genes identification, differentially expressed genes, and pathway analysis. It is hard to establish a universally good metric. Thus, a universally good classification method for all kinds of scenarios does not exist. In addition, reference and query data in cell classification are usually from different experimental batches, and failure to consider batch effects may result in misleading conclusions. To overcome this bottleneck, we propose a robust ensemble approach to classify cells and utilize a batch correction method between reference and query data. We simulated four scenarios that comprise simple to complex batch effect and account for varying cell-type proportions. We further tested our approach on both lung and pancreas data. We found improved prediction accuracy and robust performance across simulation scenarios and real data. The incorporation of batch effect correction between reference and query, and the ensemble approach improve cell-type prediction accuracy while maintaining robustness. We demonstrated these through simulated and real scRNA-Seq data.

2.
ArXiv ; 2023 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-34012994

RESUMO

In modern statistics, interests shift from pursuing the uniformly minimum variance unbiased estimator to reducing mean squared error (MSE) or residual squared error. Shrinkage based estimation and regression methods offer better prediction accuracy and improved interpretation. However, the characterization of such optimal statistics in terms of minimizing MSE remains open and challenging in many problems, for example estimating treatment effect in adaptive clinical trials with pre-planned modifications to design aspects based on accumulated data. From an alternative perspective, we propose a deep neural network based automatic method to construct an improved estimator from existing ones. Theoretical properties are studied to provide guidance on applicability of our estimator to seek potential improvement. Simulation studies demonstrate that the proposed method has considerable finite-sample efficiency gain as compared with several common estimators. In the Adaptive COVID-19 Treatment Trial (ACTT) as an important application, our ensemble estimator essentially contributes to a more ethical and efficient adaptive clinical trial with fewer patients enrolled. The proposed framework can be generally applied to various statistical problems, and can be served as a reference measure to guide statistical research.

3.
Biometrics ; 2022 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-36585916

RESUMO

In recent years, the field of precision medicine has seen many advancements. Significant focus has been placed on creating algorithms to estimate individualized treatment rules (ITRs), which map from patient covariates to the space of available treatments with the goal of maximizing patient outcome. Direct learning (D-Learning) is a recent one-step method which estimates the ITR by directly modeling the treatment-covariate interaction. However, when the variance of the outcome is heterogeneous with respect to treatment and covariates, D-Learning does not leverage this structure. Stabilized direct learning (SD-Learning), proposed in this paper, utilizes potential heteroscedasticity in the error term through a residual reweighting which models the residual variance via flexible machine learning algorithms such as XGBoost and random forests. We also develop an internal cross-validation scheme which determines the best residual model among competing models. SD-Learning improves the efficiency of D-Learning estimates in binary and multi-arm treatment scenarios. The method is simple to implement and an easy way to improve existing algorithms within the D-Learning family, including original D-Learning, Angle-based D-Learning (AD-Learning), and Robust D-learning (RD-Learning). We provide theoretical properties and justification of the optimality of SD-Learning. Head-to-head performance comparisons with D-Learning methods are provided through simulations, which demonstrate improvement in terms of average prediction error (APE), misclassification rate, and empirical value, along with a data analysis of an acquired immunodeficiency syndrome (AIDS) randomized clinical trial.

4.
Technometrics ; 64(1): 52-64, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36312889

RESUMO

Budget constraints become an important consideration in modern predictive modeling due to the high cost of collecting certain predictors. This motivates us to develop cost-constrained predictive modeling methods. In this paper, we study a new high-dimensional cost-constrained linear regression problem, that is, we aim to find the cost-constrained regression model with the smallest expected prediction error among all models satisfying a budget constraint. The non-convex budget constraint makes this problem NP-hard. In order to estimate the regression coefficient vector of the cost-constrained regression model, we propose a new discrete first-order continuous optimization method. In particular, our method delivers a series of estimates of the regression coefficient vector by solving a sequence of 0-1 knapsack problems. Theoretically, we prove that the series of the estimates generated by our iterative algorithm converge to a first-order stationary point, which can be a globally optimal solution under some conditions. Furthermore, we study some extensions of our method that can be used for general statistical learning problems and problems with groups of variables. Numerical studies using simulated datasets and a real dataset from a diabetes study indicate that our proposed method can solve problems of fairly high dimensions with promising performance. Supplementary materials for this article are available online.

5.
J Med Internet Res ; 24(3): e27934, 2022 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-35230244

RESUMO

BACKGROUND: Monitoring eating is central to the care of many conditions such as diabetes, eating disorders, heart diseases, and dementia. However, automatic tracking of eating in a free-living environment remains a challenge because of the lack of a mature system and large-scale, reliable training set. OBJECTIVE: This study aims to fill in this gap by an integrative engineering and machine learning effort and conducting a large-scale study in terms of monitoring hours on wearable-based eating detection. METHODS: This prospective, longitudinal, passively collected study, covering 3828 hours of records, was made possible by programming a digital system that streams diary, accelerometer, and gyroscope data from Apple Watches to iPhones and then transfers the data to the cloud. RESULTS: On the basis of this data collection, we developed deep learning models leveraging spatial and time augmentation and inferring eating at an area under the curve (AUC) of 0.825 within 5 minutes in the general population. In addition, the longitudinal follow-up of the study design encouraged us to develop personalized models that detect eating behavior at an AUC of 0.872. When aggregated to individual meals, the AUC is 0.951. We then prospectively collected an independent validation cohort in a different season of the year and validated the robustness of the models (0.941 for meal-level aggregation). CONCLUSIONS: The accuracy of this model and the data streaming platform promises immediate deployment for monitoring eating in applications such as diabetic integrative care.


Assuntos
Aprendizado de Máquina , Refeições , Área Sob a Curva , Comportamento Alimentar , Humanos , Estudos Prospectivos
6.
Stat Med ; 41(4): 719-735, 2022 02 20.
Artigo em Inglês | MEDLINE | ID: mdl-34786731

RESUMO

Statistical methods generating individualized treatment rules (ITRs) often focus on maximizing expected benefit, but these rules may expose patients to excess risk. For instance, aggressive treatment of type 2 diabetes (T2D) with insulin therapies may result in an ITR which controls blood glucose levels but increases rates of hypoglycemia, diminishing the appeal of the ITR. This work proposes two methods to identify risk-controlled ITRs (rcITR), a class of ITR which maximizes a benefit while controlling risk at a prespecified threshold. A novel penalized recursive partitioning algorithm is developed which optimizes an unconstrained, penalized value function. The final rule is a risk-controlled decision tree (rcDT) that is easily interpretable. A natural extension of the rcDT model, risk controlled random forests (rcRF), is also proposed. Simulation studies demonstrate the robustness of rcRF modeling. Three variable importance measures are proposed to further guide clinical decision-making. Both rcDT and rcRF procedures can be applied to data from randomized controlled trials or observational studies. An extensive simulation study interrogates the performance of the proposed methods. A data analysis of the DURABLE diabetes trial in which two therapeutics were compared is additionally presented. An R package implements the proposed methods ( https://github.com/kdoub5ha/rcITR).


Assuntos
Diabetes Mellitus Tipo 2 , Medicina de Precisão , Algoritmos , Simulação por Computador , Árvores de Decisões , Diabetes Mellitus Tipo 2/tratamento farmacológico , Humanos , Medicina de Precisão/métodos
7.
Biometrika ; 108(1): 183-198, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33840817

RESUMO

Progression of chronic disease is often manifested by repeated occurrences of disease-related events over time. Delineating the heterogeneity in the risk of such recurrent events can provide valuable scientific insight for guiding customized disease management. In this paper, we propose a new sensible measure of individual risk of recurrent events and present a dynamic modeling framework thereof, which accounts for both observed covariates and unobservable frailty. The proposed modeling requires no distributional specification of the unobservable frailty, while permitting the exploration of dynamic effects of the observed covariates. We develop estimation and inference procedures for the proposed model through a novel adaptation of the principle of conditional score. The asymptotic properties of the proposed estimator, including the uniform consistency and weak convergence, are established. Extensive simulation studies demonstrate satisfactory finite-sample performance of the proposed method. We illustrate the practical utility of the new method via an application to a diabetes clinical trial that explores the risk patterns of hypoglycemia in Type 2 diabetes patients.

8.
J Biopharm Stat ; 31(1): 5-13, 2021 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-32419590

RESUMO

Hypoglycemia is a major safety concern for diabetic patients. Hypoglycemic events can be modeled based on time to recurrent events or count data. In this article, we evaluated a gamma frailty model with variance estimated by the inverse of observed Fisher information matrix, a gamma frailty model with the sandwich variance estimator, and a piecewise negative binomial regression model. Simulations showed that the sandwich variance estimator performed better when the frailty model is mis-specified, and the piecewise negative binomial regression sometimes fails to converge. All three methods were applied to a dataset from a clinical trial evaluating insulin treatments.


Assuntos
Hipoglicemia , Humanos , Hipoglicemia/epidemiologia , Modelos Estatísticos , Recidiva
9.
Biometrics ; 77(4): 1254-1264, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-32918486

RESUMO

One central task in precision medicine is to establish individualized treatment rules (ITRs) for patients with heterogeneous responses to different therapies. Motivated from a randomized clinical trial for Type 2 diabetic patients on a comparison of two drugs, that is, pioglitazone and gliclazide, we consider a problem: utilizing promising candidate biomarkers to improve an existing ITR. This calls for a biomarker evaluation procedure that enables to gauge added values of individual biomarkers. We propose an assessment analytic, termed as net benefit index (NBI), that quantifies a contrast between the resulting gain and loss of treatment benefits when a biomarker enters ITR to reallocate patients in treatments. We optimize reallocation schemes via outcome weighted learning (OWL), from which the optimal treatment group labels are generated by weighted support vector machine (SVM). To account for sampling uncertainty in assessing a biomarker, we propose an NBI-based test for a significant improvement over the existing ITR, where the empirical null distribution is constructed via the method of stratified permutation by treatment arms. Applying NBI to the motivating diabetes trial, we found that baseline fasting insulin is an important biomarker that leads to an improvement over an existing ITR based only on patient's baseline fasting plasma glucose (FPG), age, and body mass index (BMI) to reduce FPG over a period of 52 weeks.


Assuntos
Diabetes Mellitus Tipo 2 , Medicina de Precisão , Biomarcadores , Diabetes Mellitus Tipo 2/tratamento farmacológico , Humanos , Hipoglicemiantes/uso terapêutico , Aprendizagem , Aprendizado de Máquina , Medicina de Precisão/métodos , Projetos de Pesquisa
10.
Stat Sin ; 30: 1857-1879, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33311956

RESUMO

Due to heterogeneity for many chronic diseases, precise personalized medicine, also known as precision medicine, has drawn increasing attentions in the scientific community. One main goal of precision medicine is to develop the most effective tailored therapy for each individual patient. To that end, one needs to incorporate individual characteristics to detect a proper individual treatment rule (ITR), by which suitable decisions on treatment assignments can be made to optimize patients' clinical outcome. For binary treatment settings, outcome weighted learning (OWL) and several of its variations have been proposed recently to estimate the ITR by optimizing the conditional expected outcome given patients' information. However, for multiple treatment scenarios, it remains unclear how to use OWL effectively. It can be shown that some direct extensions of OWL for multiple treatments, such as one-versus-one and one-versus-rest methods, can yield suboptimal performance. In this paper, we propose a new learning method, named Multicategory Outcome weighted Margin-based Learning (MOML), for estimating ITR with multiple treatments. Our proposed method is very general and covers OWL as a special case. We show Fisher consistency for the estimated ITR, and establish convergence rate properties. Variable selection using the sparse l 1 penalty is also considered. Analysis of simulated examples and a type 2 diabetes mellitus observational study are used to demonstrate competitive performance of the proposed method.

11.
Biometrics ; 76(4): 1075-1086, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32365232

RESUMO

Individualized treatment rules (ITRs) tailor medical treatments according to patient-specific characteristics in order to optimize patient outcomes. Data from randomized controlled trials (RCTs) are used to infer valid ITRs using statistical and machine learning methods. However, RCTs are usually conducted under specific inclusion/exclusion criteria, thus limiting their generalizability to a broader patient population in real-world practice settings. Because electronic health records (EHRs) document treatment prescriptions in the real world, transferring information in EHRs to RCTs, if done appropriately, could potentially improve the performance of ITRs, in terms of precision and generalizability. In this work, we propose a new domain adaptation method to learn ITRs by incorporating information from EHRs. Unless we assume that there is no unmeasured confounding in EHRs, we cannot directly learn the optimal ITR from the combined EHR and RCT data. Instead, we first pretrain "super" features from EHRs that summarize physician treatment decisions and patient observed benefits in the real world, as these are likely to be informative of the optimal ITRs. We then augment the feature space of the RCT and learn the optimal ITRs by stratifying by super features using subjects enrolled in RCT. We adopt Q-learning and a modified matched-learning algorithm for estimation. We present heuristic justification of our method and conduct simulation studies to demonstrate the performance of super features. Finally, we apply our method to transfer information learned from EHRs of patients with type 2 diabetes to learn individualized insulin therapies from RCT data.


Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina , Algoritmos , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto , Projetos de Pesquisa
12.
J Am Stat Assoc ; 115(530): 678-691, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-34219848

RESUMO

Estimating an optimal individualized treatment rule (ITR) based on patients' information is an important problem in precision medicine. An optimal ITR is a decision function that optimizes patients' expected clinical outcomes. Many existing methods in the literature are designed for binary treatment settings with the interest of a continuous outcome. Much less work has been done on estimating optimal ITRs in multiple treatment settings with good interpretations. In this article, we propose angle-based direct learning (AD-learning) to efficiently estimate optimal ITRs with multiple treatments. Our proposed method can be applied to various types of outcomes, such as continuous, survival, or binary outcomes. Moreover, it has an interesting geometric interpretation on the effect of different treatments for each individual patient, which can help doctors and patients make better decisions. Finite sample error bounds have been established to provide a theoretical guarantee for AD-learning. Finally, we demonstrate the superior performance of our method via an extensive simulation study and real data applications. Supplementary materials for this article are available online.

13.
Artigo em Inglês | MEDLINE | ID: mdl-34335111

RESUMO

The individualized treatment recommendation (ITR) is an important analytic framework for precision medicine. The goal of ITR is to assign the best treatments to patients based on their individual characteristics. From the machine learning perspective, the solution to the ITR problem can be formulated as a weighted classification problem to maximize the mean benefit from the recommended treatments given patients' characteristics. Several ITR methods have been proposed in both the binary setting and the multicategory setting. In practice, one may prefer a more flexible recommendation that includes multiple treatment options. This motivates us to develop methods to obtain a set of near-optimal individualized treatment recommendations alternative to each other, called alternative individualized treatment recommendations (A-ITR). We propose two methods to estimate the optimal A-ITR within the outcome weighted learning (OWL) framework. Simulation studies and a real data analysis for Type 2 diabetic patients with injectable antidiabetic treatments are conducted to show the usefulness of the proposed A-ITR framework. We also show the consistency of these methods and obtain an upper bound for the risk between the theoretically optimal recommendation and the estimated one. An R package aitr has been developed, found at https://github.com/menghaomiao/aitr.

15.
J Am Stat Assoc ; 114(528): 1854-1864, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-37982094

RESUMO

In comparing two treatments via a randomized clinical trial, the analysis of covariance (ANCOVA) technique is often utilized to estimate an overall treatment effect. The ANCOVA is generally perceived as a more efficient procedure than its simple two sample estimation counterpart. Unfortunately, when the ANCOVA model is nonlinear, the resulting estimator is generally not consistent. Recently, various nonparametric alternatives to the ANCOVA, such as the augmentation methods, have been proposed to estimate the treatment effect by adjusting the covariates. However, the properties of these alternatives have not been studied in the presence of treatment allocation imbalance. In this article, we take a different approach to explore how to improve the precision of the naive two-sample estimate even when the observed distributions of baseline covariates between two groups are dissimilar. Specifically, we derive a bias-adjusted estimation procedure constructed from a conditional inference principle via relevant ancillary statistics from the observed covariates. This estimator is shown to be asymptotically equivalent to an augmentation estimator under the unconditional setting. We utilize the data from a clinical trial for evaluating a combination treatment of cardiovascular diseases to illustrate our findings.

16.
J Appl Stat ; 46(16): 2884-2904, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32132765

RESUMO

Quantile regression has demonstrated promising utility in longitudinal data analysis. Existing work is primarily focused on modeling cross-sectional outcomes, while outcome trajectories often carry more substantive information in practice. In this work, we develop a trajectory quantile regression framework that is designed to robustly and flexibly investigate how latent individual trajectory features are related to observed subject characteristics. The proposed models are built under multilevel modeling with usual parametric assumptions lifted or relaxed. We derive our estimation procedure by novelly transforming the problem at hand to quantile regression with perturbed responses and adapting the bias correction technique for handling covariate measurement errors. We establish desirable asymptotic properties of the proposed estimator, including uniform consistency and weak convergence. Extensive simulation studies confirm the validity of the proposed method as well as its robustness. An application to the DURABLE trial uncovers sensible scientific findings and illustrates the practical value of our proposals.

17.
Stat Med ; 38(3): 315-325, 2019 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-30302780

RESUMO

The weighted average treatment effect is a causal measure for the comparison of interventions in a specific target population, which may be different from the population where data are sampled from. For instance, when the goal is to introduce a new treatment to a target population, the question is what efficacy (or effectiveness) can be gained by switching patients from a standard of care (control) to this new treatment, for which the average treatment effect for the control estimand can be applied. In this paper, we propose two estimators based on augmented inverse probability weighting to estimate the weighted average treatment effect for a well-defined target population (ie, there exists a predefined target function of covariates that characterizes the population of interest, for example, a function of age to focus on elderly diabetic patients using samples from the US population). The first proposed estimator is doubly robust if the target function is known or can be correctly specified. The second proposed estimator is doubly robust if the target function has a linear dependence on the propensity score, which can be used to estimate the average treatment effect for the treated and the average treatment effect for the control. We demonstrate the properties of the proposed estimators through theoretical proof and simulation studies. We also apply our proposed methods in a comparison of glucagon-like peptide-1 receptor agonists therapy and insulin therapy among patients with type 2 diabetes, using the UK Clinical Practice Research Datalink data.


Assuntos
Interpretação Estatística de Dados , Resultado do Tratamento , Adulto , Fatores Etários , Idoso , Diabetes Mellitus Tipo 2/tratamento farmacológico , Feminino , Humanos , Hipoglicemiantes/uso terapêutico , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Probabilidade , Pontuação de Propensão
18.
J Biopharm Stat ; 29(2): 287-305, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30359554

RESUMO

Dose titration becomes more and more common in improving drug tolerability as well as determining individualized treatment doses, thereby maximizing the benefit to patients. Dose titration starting from a lower dose and gradually increasing to a higher dose enables improved tolerability in patients as the human body may gradually adapt to adverse gastrointestinal effects. Current statistical analyses mostly focus on the outcome at the end-of-study follow-up without considering the longitudinal impact of dose titration on the outcome. Better understanding of the dynamic effect of dose titration over time is important in early-phase clinical development as it could allow to model the longitudinal trend and predict the longer term outcome more accurately. We propose a parametric model with two empirical methods of modeling the error terms for a continuous outcome with dose titrations. Simulations show that both approaches of modeling the error terms work well. We applied this method to analyze data from a few clinical studies and achieved satisfactory results.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/prevenção & controle , Peptídeos Semelhantes ao Glucagon/administração & dosagem , Hipoglicemiantes/administração & dosagem , Modelos Estatísticos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Simulação por Computador , Relação Dose-Resposta a Droga , Esquema de Medicação , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Peptídeo 1 Semelhante ao Glucagon/agonistas , Peptídeos Semelhantes ao Glucagon/efeitos adversos , Peptídeos Semelhantes ao Glucagon/uso terapêutico , Humanos , Hipoglicemiantes/efeitos adversos , Hipoglicemiantes/uso terapêutico , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Resultado do Tratamento
19.
Stat Med ; 37(25): 3589-3598, 2018 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-30047148

RESUMO

To evaluate the totality of one treatment's benefit/risk profile relative to an alternative treatment via a longitudinal comparative clinical study, the timing and occurrence of multiple clinical events are typically collected during the patient's follow-up. These multiple observations reflect the patient's disease progression/burden over time. The standard practice is to create a composite endpoint from the multiple outcomes, the timing of the occurrence of the first clinical event, to evaluate the treatment via the standard survival analysis techniques. By ignoring all events after the composite outcome, this type of assessment may not be ideal. Various parametric or semiparametric procedures have been extensively discussed in the literature for the purposes of analyzing multiple event-time data. Many existing methods were developed based on extensive model assumptions. When the model assumptions are not plausible, the resulting inferences for the treatment effect may be misleading. In this article, we propose a simple, nonparametric inference procedure to quantify the treatment effect, which has an intuitive clinically meaningful interpretation. We use the data from a cardiovascular clinical trial for heart failure to illustrate the procedure. A simulation study is also conducted to evaluate the performance of the new proposal.


Assuntos
Interpretação Estatística de Dados , Estudos Longitudinais , Resultado do Tratamento , Área Sob a Curva , Humanos , Modelos Estatísticos , Modelos de Riscos Proporcionais , Ensaios Clínicos Controlados Aleatórios como Assunto , Análise de Sobrevida , Fatores de Tempo
20.
Stat Med ; 37(27): 3869-3886, 2018 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-30014497

RESUMO

With the advancement in drug development, multiple treatments are available for a single disease. Patients can often benefit from taking multiple treatments simultaneously. For example, patients in Clinical Practice Research Datalink with chronic diseases such as type 2 diabetes can receive multiple treatments simultaneously. Therefore, it is important to estimate what combination therapy from which patients can benefit the most. However, to recommend the best treatment combination is not a single label but a multilabel classification problem. In this paper, we propose a novel outcome weighted deep learning algorithm to estimate individualized optimal combination therapy. The Fisher consistency of the proposed loss function under certain conditions is also provided. In addition, we extend our method to a family of loss functions, which allows adaptive changes based on treatment interactions. We demonstrate the performance of our methods through simulations and real data analysis.


Assuntos
Algoritmos , Quimioterapia Combinada , Aprendizado de Máquina , Medicina de Precisão , Estatística como Assunto/métodos , Resultado do Tratamento , Técnicas de Apoio para a Decisão , Quimioterapia Combinada/métodos , Humanos , Modelos Estatísticos , Medicina de Precisão/métodos , Processos Estocásticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA