Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37419612

RESUMO

Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.


Assuntos
Algoritmos , Genômica , Teorema de Bayes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Espectrometria de Massas/métodos
2.
Biometrics ; 72(4): 1066-1077, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27060877

RESUMO

Semi-parametric methods are often used for the estimation of intervention effects on correlated outcomes in cluster-randomized trials (CRTs). When outcomes are missing at random (MAR), Inverse Probability Weighted (IPW) methods incorporating baseline covariates can be used to deal with informative missingness. Also, augmented generalized estimating equations (AUG) correct for imbalance in baseline covariates but need to be extended for MAR outcomes. However, in the presence of interactions between treatment and baseline covariates, neither method alone produces consistent estimates for the marginal treatment effect if the model for interaction is not correctly specified. We propose an AUG-IPW estimator that weights by the inverse of the probability of being a complete case and allows different outcome models in each intervention arm. This estimator is doubly robust (DR); it gives correct estimates whether the missing data process or the outcome model is correctly specified. We consider the problem of covariate interference which arises when the outcome of an individual may depend on covariates of other individuals. When interfering covariates are not modeled, the DR property prevents bias as long as covariate interference is not present simultaneously for the outcome and the missingness. An R package is developed implementing the proposed method. An extensive simulation study and an application to a CRT of HIV risk reduction-intervention in South Africa illustrate the method.


Assuntos
Análise por Conglomerados , Modelos Estatísticos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Simulação por Computador , Interpretação Estatística de Dados , Infecções por HIV , Humanos , Risco , Resultado do Tratamento
3.
Pharm Stat ; 15(4): 349-61, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27169874

RESUMO

By examining the outcome trajectories of the dropout patients with different reasons in the schizophrenia trials, we note that although patients are recruited from the same protocol that have compatible baseline characteristics, they may respond differently even to the same treatment. Some patients show consistent improvement while others only have temporary relief. This creates different patient subpopulations characterized by their response and dropout patterns. At the same time, those who continue to improve seem to be more likely to complete the study while those who only experience temporary relief have a higher chance to drop out. Such phenomenon appears to be quite general in schizophrenia clinical trials. This simultaneous inhomogeneity both in patient response as well as dropout patterns creates a scenario of missing not at random and therefore results in biases when we use the statistical methods based on the missing at random assumption to test treatment efficacy. In this paper, we propose to use the latent class growth mixture model, which is a special case of the latent mixture model, to conduct the statistical analyses in such situation. This model allows us to take the inhomogeneity among subpopulations into consideration to make more accurate inferences on the treatment effect at any visit time. Comparing with the conventional statistical methods such as mixed-effects model for repeated measures, we demonstrate through simulations that the proposed latent mixture model approach gives better control on the Type I error rate in testing treatment effect. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.


Assuntos
Modelos Estatísticos , Pacientes Desistentes do Tratamento , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Esquizofrenia/terapia , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Esquizofrenia/diagnóstico , Resultado do Tratamento
4.
Adv Stat Anal ; 101(3): 267-288, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28947920

RESUMO

The density function is a fundamental concept in data analysis. When a population consists of heterogeneous subjects, it's often of great interest to estimate the density functions of the subpopulations. Nonparametric methods such as kernel smoothing estimates may be applied to each subpopulation to estimate the density functions if there are no missing values. In situations where the membership for a subpopulation is missing, kernel smoothing estimates using only subjects with membership available are valid only under missing complete at random (MCAR). In this paper, we propose new kernel smoothing methods for density function estimates by applying prediction models of the membership under the missing at random (MAR) assumption. The asymptotic properties of the new estimates are developed, and simulation studies and a real study in mental health are used to illustrate the performance of the new estimates.

5.
Front Psychol ; 8: 722, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28553242

RESUMO

The missing not at random (MNAR) mechanism may bias parameter estimates and even distort study results. This study compared the maximum likelihood (ML) selection model based on missing at random (MAR) mechanism and the Diggle-Kenward selection model based on MNAR mechanism for handling missing data through a Monte Carlo simulation study. Four factors were considered, including the missingness mechanism, the dropout rate, the distribution shape (i.e., skewness and kurtosis), and the sample size. The results indicated that: (1) Under the MAR mechanism, the Diggle-Kenward selection model yielded similar estimation results with the ML approach; Under the MNAR mechanism, the results of ML approach were underestimated, especially for the intercept mean and intercept slope (µ i and µ s ). (2) Under the MAR mechanism, the 95% CP of the Diggle-Kenward selection model was lower than that of the ML method; Under the MNAR mechanism, the 95% CP for the two methods were both under the desired level of 95%, but the Diggle-Kenward selection model yielded much higher coverage probabilities than the ML method. (3) The Diggle-Kenward selection model was easier to be influenced by the non-normal degree of target variable's distribution than the ML approach. The level of dropout rate was the major factor affecting the parameter estimation precision, and the dropout rate-induced difference of two methods can be ignored only when the dropout rate falls below 10%.

6.
Ann Transl Med ; 3(22): 356, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26807411

RESUMO

Functions shipped with R base can fulfill many tasks of missing data handling. However, because the data volume of electronic medical record (EMR) system is always very large, more sophisticated methods may be helpful in data management. The article focuses on missing data handling by using advanced techniques. There are three types of missing data, that is, missing completely at random (MCAR), missing at random (MAR) and not missing at random (NMAR). This classification system depends on how missing values are generated. Two packages, Multivariate Imputation by Chained Equations (MICE) and Visualization and Imputation of Missing Values (VIM), provide sophisticated functions to explore missing data pattern. In particular, the VIM package is especially helpful in visual inspection of missing data. Finally, correlation analysis provides information on the dependence of missing data on other variables. Such information is useful in subsequent imputations.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA