Pesquisa | Biblioteca Virtual em Saúde

1.

A deep learning approach for overall survival prediction in lung cancer with missing values.

Caruso, Camillo Maria; Guarrasi, Valerio; Ramella, Sara; Soda, Paolo.

Comput Methods Programs Biomed ; 254: 108308, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38968829

RESUMO

BACKGROUND AND OBJECTIVE: In the field of lung cancer research, particularly in the analysis of overall survival (OS), artificial intelligence (AI) serves crucial roles with specific aims. Given the prevalent issue of missing data in the medical domain, our primary objective is to develop an AI model capable of dynamically handling this missing data. Additionally, we aim to leverage all accessible data, effectively analyzing both uncensored patients who have experienced the event of interest and censored patients who have not, by embedding a specialized technique within our AI model, not commonly utilized in other AI tasks. Through the realization of these objectives, our model aims to provide precise OS predictions for non-small cell lung cancer (NSCLC) patients, thus overcoming these significant challenges. METHODS: We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. More specifically, this model tailors the transformer architecture to tabular data by adapting its feature embedding and masked self-attention to mask missing data and fully exploit the available ones. By making use of ad-hoc designed losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. RESULTS: We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used. CONCLUSIONS: The results show that our model not only outperforms the state-of-the-art's performance but also simplifies the analysis in the presence of missing data, by effectively eliminating the need to identify the most appropriate imputation strategy for predicting OS in NSCLC patients.

Assuntos

Carcinoma Pulmonar de Células não Pequenas , Aprendizado Profundo , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/mortalidade , Carcinoma Pulmonar de Células não Pequenas/mortalidade , Análise de Sobrevida , Algoritmos , Masculino , Feminino , Prognóstico , Inteligência Artificial

2.

The FACT-GP5 as a global tolerability measure: responsiveness and robustness to missing assessments.

Arizmendi, Cara; Zhu, Yanyan; Khan, Maryam; Gable, Jonathon; Reeve, Bryce B; King-Kallimanis, Bellinda; Bell, Jill.

Qual Life Res ; 2024 Jul 24.

Artigo em Inglês | MEDLINE | ID: mdl-39046616

RESUMO

PURPOSE: The Functional Assessment of Cancer Therapy item (FACT-GP5) has the potential to provide an understanding of global treatment tolerability from the patient perspective. Longitudinal evaluations of the FACT-GP5 and challenges posed by data missing-not-at-random (MNAR) have not been explored. Robustness of the FACT-GP5 to missing data assumptions and the responsiveness of the FACT-GP5 to key side-effects are evaluated. METHODS: In a randomized, double-blind study (NCT00065325), postmenopausal women (n = 618) with hormone receptor-positive (HR+), advanced breast cancer received either fulvestrant or exemestane and completed FACT measures monthly for seven months. Cumulative link mixed models (CLMM) were fit to evaluate: (1) the trajectory of the FACT-GP5 and (2) the responsiveness of the FACT-GP5 to CTCAE grade, Eastern Cooperative Oncology Group (ECOG) Performance Status scale, and key side-effects from the FACT. Sensitivity analyses of the missing-at-random (MAR) assumption were conducted. RESULTS: Odds of reporting worse side-effect bother increased over time. There were positive within-person relationships between level of side-effect bother (FACT-GP5) and severity of other FACT items, as well as ECOG performance status and Common Terminology Criteria for Adverse Events (CTCAE) grade. The number of missing FACT-GP5 assessments impacted the trajectory of the FACT-GP5 but did not impact the relationships between the FACT-GP5 and other items (except for nausea [FACT-GP2]). CONCLUSIONS: Results support the responsiveness of the FACT-GP5. Generally speaking, the responsiveness of the FACT-GP5 is robust to missing assessments. Missingness should be considered, however, when evaluating change over time of the FACT-GP5. TRIAL REGISTRATION: NCT00065325. TRIAL REGISTRATION YEAR: 2003.

Researchers have been exploring the use of a single question, FACT-GP5 ("I am bothered by side effects of treatment"), as a quick way to learn about drug tolerability from the patients' perspective. This study explores if this single question can capture changes in tolerability during treatment, and if the assessment is missed by patients, whether that impacts the interpretation of tolerability. In our study, we found that the FACT-GP5 can be used to understand how tolerability changes during treatment. Missing assessments of the FACT-GP5 are important to account for when interpreting results. The FACT-GP5 may be a useful question for capturing the patient experience of drug tolerability.

3.

Analytical approaches in surgical oncological research.

Alnajar, Ahmed; Akcin, Mehmet; Kutlu, Onur.

Colorectal Dis ; 2024 Jul 19.

Artigo em Inglês | MEDLINE | ID: mdl-39031910

4.

Is social disadvantage a chronic stressor? Socioeconomic position and HPA axis activity among older adults living in England.

Chatzi, Georgia; Chandola, Tarani; Shlomo, Natalie; Cernat, Alexandru; Hannemann, Tina.

Psychoneuroendocrinology ; 168: 107116, 2024 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-38981200

RESUMO

INTRODUCTION: Living in socioeconomic disadvantage has been conceptualised as a chronic stressor, although this contradicts evidence from studies using hair cortisol and cortisone as a measure of hypothalamus-pituitary-adrenal (HPA)1 axis activity. These studies used complete case analyses, ignoring the impact of missing data for inference, despite the high proportion of missing biomarker data. The methodological limitations of studies investigating the association between socioeconomic position (SEP)2 defined as education, wealth, and social class and hair cortisol and cortisone are considered in this study by comparing three common methods to deal with missing data: (1) Complete Case Analysis (CCA),3 (2) Inverse Probability Weighting (IPW) 4and (3) weighted Multiple Imputation (MI).5 This study examines if socioeconomic disadvantage is associated with higher levels of HPA axis activity as measured by hair cortisol and cortisone among older adults using three approaches for compensating for missing data. METHOD: Cortisol and cortisone levels in hair samples from 4573 participants in the 6th wave (2012-2013) of the English Longitudinal Study of Ageing (ELSA)6 were examined, in relation to education, wealth, and social class. We compared linear regression models with CCA, weighted and multiple imputed weighted linear regression models. RESULTS: Social groups with certain characteristics (i.e., ethnic minorities, in routine and manual occupations, physically inactive, with poorer health, and smokers) were less likely to have hair cortisol and hair cortisone data compared to the most advantaged groups. We found a consistent pattern of higher levels of hair cortisol and cortisone among the most socioeconomically disadvantaged groups compared to the most advantaged groups. Complete case approaches to missing data underestimated the levels of hair cortisol in education and social class and the levels of hair cortisone in education, wealth, and social class in the most disadvantaged groups. CONCLUSION: This study demonstrates that social disadvantage as measured by disadvantaged SEP is associated with increased HPA axis activity. The conceptualisation of social disadvantage as a chronic stressor may be valid and previous studies reporting no associations between SEP and hair cortisol may be biased due to their lack of consideration of missing data cases which showed the underrepresentation of disadvantaged social groups in the analyses. Future analyses using biosocial data may need to consider and adjust for missing data.

5.

Missing genotype imputation in non-model species using self-organizing maps.

Mora-Márquez, Fernando; Nuño, Juan Carlos; Soto, Álvaro; López de Heredia, Unai.

Mol Ecol Resour ; : e13992, 2024 Jul 06.

Artigo em Inglês | MEDLINE | ID: mdl-38970328

RESUMO

Current methodologies of genome-wide single-nucleotide polymorphism (SNP) genotyping produce large amounts of missing data that may affect statistical inference and bias the outcome of experiments. Genotype imputation is routinely used in well-studied species to buffer the impact in downstream analysis, and several algorithms are available to fill in missing genotypes. The lack of reference haplotype panels precludes the use of these methods in genomic studies on non-model organisms. As an alternative, machine learning algorithms are employed to explore the genotype data and to estimate the missing genotypes. Here, we propose an imputation method based on self-organizing maps (SOM), a widely used neural networks formed by spatially distributed neurons that cluster similar inputs into close neurons. The method explores genotype datasets to select SNP loci to build binary vectors from the genotypes, and initializes and trains neural networks for each query missing SNP genotype. The SOM-derived clustering is then used to impute the best genotype. To automate the imputation process, we have implemented gtImputation, an open-source application programmed in Python3 and with a user-friendly GUI to facilitate the whole process. The method performance was validated by comparing its accuracy, precision and sensitivity on several benchmark genotype datasets with other available imputation algorithms. Our approach produced highly accurate and precise genotype imputations even for SNPs with alleles at low frequency and outperformed other algorithms, especially for datasets from mixed populations with unrelated individuals.

6.

Novel approach exploring the correlation between presepsin and routine laboratory parameters using explainable artificial intelligence.

Jeong, Jae-Seung; Kang, Tak Ho; Ju, Hyunsu; Cho, Chi-Hyun.

Heliyon ; 10(13): e33826, 2024 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-39027625

RESUMO

Although presepsin, a crucial biomarker for the diagnosis and management of sepsis, has gained prominence in contemporary medical research, its relationship with routine laboratory parameters, including demographic data and hospital blood test data, remains underexplored. This study integrates machine learning with explainable artificial intelligence (XAI) to provide insights into the relationship between presepsin and these parameters. Advanced machine learning classifiers provide a multilateral view of data and play an important role in highlighting the interrelationships between presepsin and other parameters. XAI enhances analysis by ensuring transparency in the model's decisions, especially in selecting key parameters that significantly enhance classification accuracy. Utilizing XAI, this study successfully identified critical parameters that increased the predictive accuracy for sepsis patients, achieving a remarkable ROC AUC of 0.97 and an accuracy of 0.94. This breakthrough is possibly attributed to the comprehensive utilization of XAI in refining parameter selection, thus leading to these significant predictive metrics. The presence of missing data in datasets is another concern; this study addresses it by employing Extreme Gradient Boosting (XGBoost) to manage missing data, effectively mitigating potential biases while preserving both the accuracy and relevance of the results. The perspective of examining data from higher dimensions using machine learning transcends traditional observation and analysis. The findings of this study hold the potential to enhance patient diagnoses and treatment, underscoring the value of merging traditional research methods with advanced analytical tools.

7.

Missing data imputation using classification and regression trees.

Chen, Cheng-Yang; Chang, Yu-Wei.

PeerJ Comput Sci ; 10: e2119, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38983189

RESUMO

Background: Missing data are common when analyzing real data. One popular solution is to impute missing data so that one complete dataset can be obtained for subsequent data analysis. In the present study, we focus on missing data imputation using classification and regression trees (CART). Methods: We consider a new perspective on missing data in a CART imputation problem and realize the perspective through some resampling algorithms. Several existing missing data imputation methods using CART are compared through simulation studies, and we aim to investigate the methods with better imputation accuracy under various conditions. Some systematic findings are demonstrated and presented. These imputation methods are further applied to two real datasets: Hepatitis data and Credit approval data for illustration. Results: The method that performs the best strongly depends on the correlation between variables. For imputing missing ordinal categorical variables, the rpart package with surrogate variables is recommended under correlations larger than 0 with missing completely at random (MCAR) and missing at random (MAR) conditions. Under missing not at random (MNAR), chi-squared test methods and the rpart package with surrogate variables are suggested. For imputing missing quantitative variables, the iterative imputation method is most recommended under moderate correlation conditions.

8.

Multimodal subtypes identified in Alzheimer's Disease Neuroimaging Initiative participants by missing-data-enabled subtype and stage inference.

Estarellas, Mar; Oxtoby, Neil P; Schott, Jonathan M; Alexander, Daniel C; Young, Alexandra L.

Brain Commun ; 6(4): fcae219, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39035417

RESUMO

Alzheimer's disease is a highly heterogeneous disease in which different biomarkers are dynamic over different windows of the decades-long pathophysiological processes, and potentially have distinct involvement in different subgroups. Subtype and Stage Inference is an unsupervised learning algorithm that disentangles the phenotypic heterogeneity and temporal progression of disease biomarkers, providing disease insight and quantitative estimates of individual subtype and stage. However, a key limitation of Subtype and Stage Inference is that it requires a complete set of biomarkers for each subject, reducing the number of datapoints available for model fitting and limiting applications of Subtype and Stage Inference to modalities that are widely collected, e.g. volumetric biomarkers derived from structural MRI. In this study, we adapted the Subtype and Stage Inference algorithm to handle missing data, enabling the application of Subtype and Stage Inference to multimodal data (magnetic resonance imaging, positron emission tomography, cerebrospinal fluid and cognitive tests) from 789 participants in the Alzheimer's Disease Neuroimaging Initiative. Missing-data Subtype and Stage Inference identified five subtypes having distinct progression patterns, which we describe by the earliest unique abnormality as 'Typical AD with Early Tau', 'Typical AD with Late Tau', 'Cortical', 'Cognitive' and 'Subcortical'. These new multimodal subtypes were differentially associated with age, years of education, Apolipoprotein E (APOE4) status, white matter hyperintensity burden and the rate of conversion from mild cognitive impairment to Alzheimer's disease, with the 'Cognitive' subtype showing the fastest clinical progression, and the 'Subcortical' subtype the slowest. Overall, we demonstrate that missing-data Subtype and Stage Inference reveals a finer landscape of Alzheimer's disease subtypes, each of which are associated with different risk factors. Missing-data Subtype and Stage Inference has broad utility, enabling the prediction of progression in a much wider set of individuals, rather than being restricted to those with complete data.

9.

Multiple Imputation with Factor Scores: A Practical Approach for Handling Simultaneous Missingness Across Items in Longitudinal Designs.

Li, Yanling; Oravecz, Zita; Ji, Linying; Chow, Sy-Miin.

Multivariate Behav Res ; : 1-29, 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-38997153

RESUMO

Missingness in intensive longitudinal data triggered by latent factors constitute one type of nonignorable missingness that can generate simultaneous missingness across multiple items on each measurement occasion. To address this issue, we propose a multiple imputation (MI) strategy called MI-FS, which incorporates factor scores, lag/lead variables, and missing data indicators into the imputation model. In the context of process factor analysis (PFA), we conducted a Monte Carlo simulation study to compare the performance of MI-FS to listwise deletion (LD), MI with manifest variables (MI-MV, which implements MI on both dependent variables and covariates), and partial MI with MVs (PMI-MV, which implements MI on covariates and handles missing dependent variables via full-information maximum likelihood) under different conditions. Across conditions, we found MI-based methods overall outperformed the LD; the MI-FS approach yielded lower root mean square errors (RMSEs) and higher coverage rates for auto-regression (AR) parameters compared to MI-MV; and the PMI-MV and MI-MV approaches yielded higher coverage rates for most parameters except AR parameters compared to MI-FS. These approaches were also compared using an empirical example investigating the relationships between negative affect and perceived stress over time. Recommendations on when and how to incorporate factor scores into MI processes were discussed.

10.

A novel deep machine learning algorithm with dimensionality and size reduction approaches for feature elimination: thyroid cancer diagnoses with randomly missing data.

Tutsoy, Onder; Sumbul, Hilmi Erdem.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-39007597

RESUMO

Thyroid cancer incidences endure to increase even though a large number of inspection tools have been developed recently. Since there is no standard and certain procedure to follow for the thyroid cancer diagnoses, clinicians require conducting various tests. This scrutiny process yields multi-dimensional big data and lack of a common approach leads to randomly distributed missing (sparse) data, which are both formidable challenges for the machine learning algorithms. This paper aims to develop an accurate and computationally efficient deep learning algorithm to diagnose the thyroid cancer. In this respect, randomly distributed missing data stemmed singularity in learning problems is treated and dimensionality reduction with inner and target similarity approaches are developed to select the most informative input datasets. In addition, size reduction with the hierarchical clustering algorithm is performed to eliminate the considerably similar data samples. Four machine learning algorithms are trained and also tested with the unseen data to validate their generalization and robustness abilities. The results yield 100% training and 83% testing preciseness for the unseen data. Computational time efficiencies of the algorithms are also examined under the equal conditions.

Assuntos

Algoritmos , Aprendizado Profundo , Neoplasias da Glândula Tireoide , Neoplasias da Glândula Tireoide/diagnóstico , Humanos , Aprendizado de Máquina , Análise por Conglomerados

11.

Incorporating informatively collected laboratory data from EHR in clinical prediction models.

Sun, Minghui; Engelhard, Matthew M; Bedoya, Armando D; Goldstein, Benjamin A.

BMC Med Inform Decis Mak ; 24(1): 206, 2024 Jul 24.

Artigo em Inglês | MEDLINE | ID: mdl-39049049

RESUMO

BACKGROUND: Electronic Health Records (EHR) are widely used to develop clinical prediction models (CPMs). However, one of the challenges is that there is often a degree of informative missing data. For example, laboratory measures are typically taken when a clinician is concerned that there is a need. When data are the so-called Not Missing at Random (NMAR), analytic strategies based on other missingness mechanisms are inappropriate. In this work, we seek to compare the impact of different strategies for handling missing data on CPMs performance. METHODS: We considered a predictive model for rapid inpatient deterioration as an exemplar implementation. This model incorporated twelve laboratory measures with varying levels of missingness. Five labs had missingness rate levels around 50%, and the other seven had missingness levels around 90%. We included them based on the belief that their missingness status can be highly informational for the prediction. In our study, we explicitly compared the various missing data strategies: mean imputation, normal-value imputation, conditional imputation, categorical encoding, and missingness embeddings. Some of these were also combined with the last observation carried forward (LOCF). We implemented logistic LASSO regression, multilayer perceptron (MLP), and long short-term memory (LSTM) models as the downstream classifiers. We compared the AUROC of testing data and used bootstrapping to construct 95% confidence intervals. RESULTS: We had 105,198 inpatient encounters, with 4.7% having experienced the deterioration outcome of interest. LSTM models generally outperformed other cross-sectional models, where embedding approaches and categorical encoding yielded the best results. For the cross-sectional models, normal-value imputation with LOCF generated the best results. CONCLUSION: Strategies that accounted for the possibility of NMAR missing data yielded better model performance than those did not. The embedding method had an advantage as it did not require prior clinical knowledge. Using LOCF could enhance the performance of cross-sectional models but have countereffects in LSTM models.

Assuntos

Registros Eletrônicos de Saúde , Humanos , Deterioração Clínica , Modelos Estatísticos , Técnicas de Laboratório Clínico

12.

Injury severity bias in missing prehospital vital signs: Prevalence and implications for trauma registries.

O'Neill, Melissa; Cheskes, Sheldon; Drennan, Ian; Keown-Stoneman, Charles; Lin, Steve; Nolan, Brodie.

Injury ; : 111747, 2024 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-39054233

RESUMO

BACKGROUND: Vital signs are important factors in assessing injury severity and guiding trauma resuscitation, especially among severely injured patients. Despite this, physiological data are frequently missing from trauma registries. This study aimed to evaluate the extent of missing prehospital data in a hospital-based trauma registry and to assess the associations between prehospital physiological data completeness and indicators of injury severity. METHODS: A retrospective review was conducted on all adult trauma patients brought directly to a level 1 trauma center in Toronto, Ontario by paramedics from January 1, 2015, to December 31, 2019. The proportion of missing data was evaluated for each variable and patterns of missingness were assessed. To investigate the associations between prehospital data completeness and injury severity factors, descriptive and unadjusted logistic regression analyses were performed. RESULTS: A total of 3,528 patients were included. We considered prehospital data missing if any of heart rate, systolic blood pressure, respiratory rate or oxygen saturation were incomplete. Each individual variable was missing from the registry in approximately 20 % of patients, with oxygen saturation missing most frequently (n = 831; 23.6 %). Over 25 % (n = 909) of patients were missing at least one prehospital vital sign, of which 69.1 % (n = 628) were missing all four of these variables. Patients with incomplete data were more severely injured, had higher mortality, and more frequently received lifesaving interventions such as blood transfusion and intubation. Patients were most likely to have missing prehospital physiological data if they died in the trauma bay (unadjusted OR: 9.79; 95 % CI: 6.35-15.10), did not survive to discharge (unadjusted OR: 3.55; 95 % CI: 2.76-4.55), or had a prehospital GCS less than 9 (OR: 3.24; 95 % CI: 2.59-4.06). CONCLUSION: In this single center trauma registry, key prehospital variables were frequently missing, particularly among more severely injured patients. Patients with missing data had higher mortality, more severe injury characteristics and received more life-saving interventions in the trauma bay, suggesting an injury severity bias in prehospital vital sign missingness. To ensure the validity of research based on trauma registry data, patterns of missingness must be carefully considered to ensure missing data is appropriately addressed.

13.

Optimal model averaging for partially linear models with missing response variables and error-prone covariates.

Liang, Zhongqi; Wang, Suojin; Cai, Li.

Stat Med ; 2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39054668

RESUMO

We consider the problem of optimal model averaging for partially linear models when the responses are missing at random and some covariates are measured with error. A novel weight choice criterion based on the Mallows-type criterion is proposed for the weight vector to be used in the model averaging. The resulting model averaging estimator for the partially linear models is shown to be asymptotically optimal under some regularity conditions in terms of achieving the smallest possible squared loss. In addition, the existence of a local minimizing weight vector and its convergence rate to the risk-based optimal weight vector are established. Simulation studies suggest that the proposed model averaging method generally outperforms existing methods. As an illustration, the proposed method is applied to analyze an HIV-CD4 dataset.

14.

Bayesian Simultaneous Factorization and Prediction Using Multi-Omic Data.

Samorodnitsky, Sarah; Wendt, Chris H; Lock, Eric F.

Comput Stat Data Anal ; 1972024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38947282

RESUMO

Integrative factorization methods for multi-omic data estimate factors explaining biological variation. Factors can be treated as covariates to predict an outcome and the factorization can be used to impute missing values. However, no available methods provide a comprehensive framework for statistical inference and uncertainty quantification for these tasks. A novel framework, Bayesian Simultaneous Factorization (BSF), is proposed to decompose multi-omics variation into joint and individual structures simultaneously within a probabilistic framework. BSF uses conjugate normal priors and the posterior mode of this model can be estimated by solving a structured nuclear norm-penalized objective that also achieves rank selection and motivates the choice of hyperparameters. BSF is then extended to simultaneously predict a continuous or binary phenotype while estimating latent factors, termed Bayesian Simultaneous Factorization and Prediction (BSFP). BSF and BSFP accommodate concurrent imputation, i.e., imputation during the model-fitting process, and full posterior inference for missing data, including "blockwise" missingness. It is shown via simulation that BSFP is competitive in recovering latent variation structure, and demonstrate the importance of accounting for uncertainty in the estimated factorization within the predictive model. The imputation performance of BSF is examined via simulation under missing-at-random and missing-not-at-random assumptions. Finally, BSFP is used to predict lung function based on the bronchoalveolar lavage metabolome and proteome from a study of HIV-associated obstructive lung disease, revealing multi-omic patterns related to lung function decline and a cluster of patients with obstructive lung disease driven by shared metabolomic and proteomic abundance patterns.

15.

Testing unit root non-stationarity in the presence of missing data in univariate time series of mobile health studies.

Fowler, Charlotte; Cai, Xiaoxuan; Baker, Justin T; Onnela, Jukka-Pekka; Valeri, Linda.

J R Stat Soc Ser C Appl Stat ; 73(3): 755-773, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38883261

RESUMO

The use of digital devices to collect data in mobile health studies introduces a novel application of time series methods, with the constraint of potential data missing at random or missing not at random (MNAR). In time-series analysis, testing for stationarity is an important preliminary step to inform appropriate subsequent analyses. The Dickey-Fuller test evaluates the null hypothesis of unit root non-stationarity, under no missing data. Beyond recommendations under data missing completely at random for complete case analysis or last observation carry forward imputation, researchers have not extended unit root non-stationarity testing to more complex missing data mechanisms. Multiple imputation with chained equations, Kalman smoothing imputation, and linear interpolation have also been used for time-series data, however such methods impose constraints on the autocorrelation structure and impact unit root testing. We propose maximum likelihood estimation and multiple imputation using state space model approaches to adapt the augmented Dickey-Fuller test to a context with missing data. We further develop sensitivity analyses to examine the impact of MNAR data. We evaluate the performance of existing and proposed methods across missing mechanisms in extensive simulations and in their application to a multi-year smartphone study of bipolar patients.

16.

The impact of different imputation methods on estimates and model performance: an example using a risk prediction model for premature mortality.

Hurst, Mackenzie; O'Neill, Meghan; Pagalan, Lief; Diemert, Lori M; Rosella, Laura C.

Popul Health Metr ; 22(1): 13, 2024 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-38886744

RESUMO

OBJECTIVE: To compare how different imputation methods affect the estimates and performance of a prediction model for premature mortality. STUDY DESIGN AND SETTING: Sex-specific Weibull accelerated failure time survival models were run on four separate datasets using complete case, mode, single and multiple imputation to impute missing values. Six performance measures were compared to access predictive accuracy (Nagelkerke R2, integrated brier score), discrimination (Harrell's c-index, discrimination slope) and calibration (calibration in the large, calibration slope). RESULTS: The highest proportion of missingness for a single variable was 10.86% for the female model and 8.24% for the male model. Comparing the performance measures for complete case, mode, single and multiple imputation: the Nagelkerke R2 values for the female model was 0.1084, 0.1116, 0.1120 and 0.111-0.1120 with the male model exhibited similar variation of 0.1050, 0.1078, 0.1078 and 0.1078-0.1081. Harrell's c-index also demonstrated small variation with values of 0.8666, 0.8719, 0.8719 and 0.8711-0.8719 for the female model and 0.8549, 0.8548, 0.8550 and 0.8550-0.8553 for the male model. CONCLUSION: In the scenarios examined in this study, mode imputation performed well when using a population health survey compared to single and multiple imputation when predictive performance measures is the main model goal. To generate unbiased hazard ratios, multiple imputation methods were superior. This study shows the need to consider the best imputation approach for a predictive model development given the conditions of missing data and the goals of the analysis.

Assuntos

Mortalidade Prematura , Humanos , Masculino , Feminino , Modelos Estatísticos , Medição de Risco/métodos , Pessoa de Meia-Idade , Interpretação Estatística de Dados , Adulto

17.

Simple graphical rules for assessing selection bias in general-population and selected-sample treatment effects.

Mathur, Maya B; Shpitser, Ilya.

Am J Epidemiol ; 2024 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-38904459

RESUMO

When analyzing a selected sample from a general population, selection bias can arise relative to the causal average treatment effect (ATE) for the general population, and also relative to the ATE for the selected sample itself. We provide simple graphical rules that indicate: (1) if a selected-sample analysis will be unbiased for each ATE; (2) whether adjusting for certain covariates could eliminate selection bias. The rules can easily be checked in a standard single-world intervention graph. When the treatment could affect selection, a third estimand of potential scientific interest is the "net treatment difference", namely the net change in outcomes that would occur for the selected sample if all members of the general population were treated versus not treated, including any effects of the treatment on which individuals are in the selected sample . We provide graphical rules for this estimand as well. We decompose bias in a selected-sample analysis relative to the general-population ATE into: (1) "internal bias" relative to the net treatment difference; (2) "net-external bias", a discrepancy between the net treatment difference and the general-population ATE. Each bias can be assessed unambiguously via a distinct graphical rule, providing new conceptual insight into the mechanisms by which certain causal structures produce selection bias.

18.

Relationship between reasons for intermittent missing patient-reported outcomes data and missing data mechanisms.

Nielsen, Lene Kongsgaard; Mercieca-Bebber, Rebecca; Möller, Sören; Redder, Louise; Jarden, Mary; Andersen, Christen Lykkegaard; Frederiksen, Henrik; Svirskaite, Asta; Silkjær, Trine; Steffensen, Morten Saaby; Pedersen, Per Trøllund; Hinge, Maja; Frederiksen, Mikael; Jensen, Bo Amdi; Helleberg, Carsten; Mylin, Anne Kærsgaard; Abildgaard, Niels; King, Madeleine T.

Qual Life Res ; 2024 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-38879861

RESUMO

PURPOSE: Non-response (NR) to patient-reported outcome (PRO) questionnaires may cause bias if not handled appropriately. Collecting reasons for NR is recommended, but how reasons for NR are related to missing data mechanisms remains unexplored. We aimed to explore this relationship for intermittent NRs. METHODS: Patients with multiple myeloma completed validated PRO questionnaires at enrolment and 12 follow-up time-points. NR was defined as non-completion of a follow-up assessment within seven days, which triggered contact with the patient, recording the reason for missingness and an invitation to complete the questionnaire (denoted "salvage response"). Mean differences between salvage and previous on-time scores were estimated for groups defined by reasons for NR using linear regression with clustered standard errors. Statistically significant mean differences larger than minimal important difference thresholds were interpreted as "missing not at random" (MNAR) mechanism (i.e. assumed to be related to declining health), and the remainder interpreted as aligned with "missing completely at random" (MCAR) mechanism (i.e. assumed unrelated to changes in health). RESULTS: Most (7228/7534 (96%)) follow-up questionnaires were completed; 11% (802/7534) were salvage responses. Mean salvage scores were compared to previous on-time scores by reason: those due to hospital admission, mental or physical reasons were worse in 10/22 PRO domains; those due to technical difficulties/procedural errors were no different in 21/22 PRO domains; and those due to overlooked/forgotten or other/unspecified reasons were no different in any domains. CONCLUSION: Intermittent NRs due to hospital admission, mental or physical reasons were aligned with MNAR mechanism for nearly half of PRO domains, while intermittent NRs due to technical difficulties/procedural errors or other/unspecified reasons generally were aligned with MCAR mechanism.

19.

Incident Tuberculosis Infection is Associated with Alcohol use in Adults in Rural Uganda.

Abbott, Rachel; Landsiedel, Kirsten; Atukunda, Mucunguzi; Puryear, Sarah B; Chamie, Gabriel; Hahn, Judith A; Mwangwa, Florence; Kakande, Elijah; Petersen, Maya L; Havlir, Diane V; Charlebois, Edwin; Balzer, Laura B; Kamya, Moses R; Marquez, Carina.

Clin Infect Dis ; 2024 Jun 02.

Artigo em Inglês | MEDLINE | ID: mdl-38824440

RESUMO

Data on alcohol use and incident Tuberculosis (TB) infection are needed. In adults aged 15+ in rural Uganda (N=49,585), estimated risk of incident TB infection was 29.2% with alcohol use vs. 19.2% without (RR: 1.49; 95%CI: 1.40-1.60). There is potential for interventions to interrupt transmission among people who drink alcohol.

20.

Trial Analysis of Brain Activity Information for the Presymptomatic Disease Detection of Rheumatoid Arthritis.

Maeda, Keisuke; Ogawa, Takahiro; Kayama, Tasuku; Sasaki, Takuya; Tainaka, Kazuki; Murakami, Masaaki; Haseyama, Miki.

Bioengineering (Basel) ; 11(6)2024 May 21.

Artigo em Inglês | MEDLINE | ID: mdl-38927759

RESUMO

This study presents a trial analysis that uses brain activity information obtained from mice to detect rheumatoid arthritis (RA) in its presymptomatic stages. Specifically, we confirmed that F759 mice, serving as a mouse model of RA that is dependent on the inflammatory cytokine IL-6, and healthy wild-type mice can be classified on the basis of brain activity information. We clarified which brain regions are useful for the presymptomatic detection of RA. We introduced a matrix completion-based approach to handle missing brain activity information to perform the aforementioned analysis. In addition, we implemented a canonical correlation-based method capable of analyzing the relationship between various types of brain activity information. This method allowed us to accurately classify F759 and wild-type mice, thereby identifying essential features, including crucial brain regions, for the presymptomatic detection of RA. Our experiment obtained brain activity information from 15 F759 and 10 wild-type mice and analyzed the acquired data. By employing four types of classifiers, our experimental results show that the thalamus and periaqueductal gray are effective for the classification task. Furthermore, we confirmed that classification performance was maximized when seven brain regions were used, excluding the electromyogram and nucleus accumbens.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA