Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.530
Filtrar
1.
PeerJ Comput Sci ; 10: e2119, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983189

RESUMO

Background: Missing data are common when analyzing real data. One popular solution is to impute missing data so that one complete dataset can be obtained for subsequent data analysis. In the present study, we focus on missing data imputation using classification and regression trees (CART). Methods: We consider a new perspective on missing data in a CART imputation problem and realize the perspective through some resampling algorithms. Several existing missing data imputation methods using CART are compared through simulation studies, and we aim to investigate the methods with better imputation accuracy under various conditions. Some systematic findings are demonstrated and presented. These imputation methods are further applied to two real datasets: Hepatitis data and Credit approval data for illustration. Results: The method that performs the best strongly depends on the correlation between variables. For imputing missing ordinal categorical variables, the rpart package with surrogate variables is recommended under correlations larger than 0 with missing completely at random (MCAR) and missing at random (MAR) conditions. Under missing not at random (MNAR), chi-squared test methods and the rpart package with surrogate variables are suggested. For imputing missing quantitative variables, the iterative imputation method is most recommended under moderate correlation conditions.

2.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39007597

RESUMO

Thyroid cancer incidences endure to increase even though a large number of inspection tools have been developed recently. Since there is no standard and certain procedure to follow for the thyroid cancer diagnoses, clinicians require conducting various tests. This scrutiny process yields multi-dimensional big data and lack of a common approach leads to randomly distributed missing (sparse) data, which are both formidable challenges for the machine learning algorithms. This paper aims to develop an accurate and computationally efficient deep learning algorithm to diagnose the thyroid cancer. In this respect, randomly distributed missing data stemmed singularity in learning problems is treated and dimensionality reduction with inner and target similarity approaches are developed to select the most informative input datasets. In addition, size reduction with the hierarchical clustering algorithm is performed to eliminate the considerably similar data samples. Four machine learning algorithms are trained and also tested with the unseen data to validate their generalization and robustness abilities. The results yield 100% training and 83% testing preciseness for the unseen data. Computational time efficiencies of the algorithms are also examined under the equal conditions.


Assuntos
Algoritmos , Aprendizado Profundo , Neoplasias da Glândula Tireoide , Neoplasias da Glândula Tireoide/diagnóstico , Humanos , Aprendizado de Máquina , Análise por Conglomerados
3.
Artigo em Inglês | MEDLINE | ID: mdl-38947282

RESUMO

Integrative factorization methods for multi-omic data estimate factors explaining biological variation. Factors can be treated as covariates to predict an outcome and the factorization can be used to impute missing values. However, no available methods provide a comprehensive framework for statistical inference and uncertainty quantification for these tasks. A novel framework, Bayesian Simultaneous Factorization (BSF), is proposed to decompose multi-omics variation into joint and individual structures simultaneously within a probabilistic framework. BSF uses conjugate normal priors and the posterior mode of this model can be estimated by solving a structured nuclear norm-penalized objective that also achieves rank selection and motivates the choice of hyperparameters. BSF is then extended to simultaneously predict a continuous or binary phenotype while estimating latent factors, termed Bayesian Simultaneous Factorization and Prediction (BSFP). BSF and BSFP accommodate concurrent imputation, i.e., imputation during the model-fitting process, and full posterior inference for missing data, including "blockwise" missingness. It is shown via simulation that BSFP is competitive in recovering latent variation structure, and demonstrate the importance of accounting for uncertainty in the estimated factorization within the predictive model. The imputation performance of BSF is examined via simulation under missing-at-random and missing-not-at-random assumptions. Finally, BSFP is used to predict lung function based on the bronchoalveolar lavage metabolome and proteome from a study of HIV-associated obstructive lung disease, revealing multi-omic patterns related to lung function decline and a cluster of patients with obstructive lung disease driven by shared metabolomic and proteomic abundance patterns.

4.
Psychoneuroendocrinology ; 168: 107116, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38981200

RESUMO

INTRODUCTION: Living in socioeconomic disadvantage has been conceptualised as a chronic stressor, although this contradicts evidence from studies using hair cortisol and cortisone as a measure of hypothalamus-pituitary-adrenal (HPA)1 axis activity. These studies used complete case analyses, ignoring the impact of missing data for inference, despite the high proportion of missing biomarker data. The methodological limitations of studies investigating the association between socioeconomic position (SEP)2 defined as education, wealth, and social class and hair cortisol and cortisone are considered in this study by comparing three common methods to deal with missing data: (1) Complete Case Analysis (CCA),3 (2) Inverse Probability Weighting (IPW) 4and (3) weighted Multiple Imputation (MI).5 This study examines if socioeconomic disadvantage is associated with higher levels of HPA axis activity as measured by hair cortisol and cortisone among older adults using three approaches for compensating for missing data. METHOD: Cortisol and cortisone levels in hair samples from 4573 participants in the 6th wave (2012-2013) of the English Longitudinal Study of Ageing (ELSA)6 were examined, in relation to education, wealth, and social class. We compared linear regression models with CCA, weighted and multiple imputed weighted linear regression models. RESULTS: Social groups with certain characteristics (i.e., ethnic minorities, in routine and manual occupations, physically inactive, with poorer health, and smokers) were less likely to have hair cortisol and hair cortisone data compared to the most advantaged groups. We found a consistent pattern of higher levels of hair cortisol and cortisone among the most socioeconomically disadvantaged groups compared to the most advantaged groups. Complete case approaches to missing data underestimated the levels of hair cortisol in education and social class and the levels of hair cortisone in education, wealth, and social class in the most disadvantaged groups. CONCLUSION: This study demonstrates that social disadvantage as measured by disadvantaged SEP is associated with increased HPA axis activity. The conceptualisation of social disadvantage as a chronic stressor may be valid and previous studies reporting no associations between SEP and hair cortisol may be biased due to their lack of consideration of missing data cases which showed the underrepresentation of disadvantaged social groups in the analyses. Future analyses using biosocial data may need to consider and adjust for missing data.

5.
Mol Ecol Resour ; : e13992, 2024 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-38970328

RESUMO

Current methodologies of genome-wide single-nucleotide polymorphism (SNP) genotyping produce large amounts of missing data that may affect statistical inference and bias the outcome of experiments. Genotype imputation is routinely used in well-studied species to buffer the impact in downstream analysis, and several algorithms are available to fill in missing genotypes. The lack of reference haplotype panels precludes the use of these methods in genomic studies on non-model organisms. As an alternative, machine learning algorithms are employed to explore the genotype data and to estimate the missing genotypes. Here, we propose an imputation method based on self-organizing maps (SOM), a widely used neural networks formed by spatially distributed neurons that cluster similar inputs into close neurons. The method explores genotype datasets to select SNP loci to build binary vectors from the genotypes, and initializes and trains neural networks for each query missing SNP genotype. The SOM-derived clustering is then used to impute the best genotype. To automate the imputation process, we have implemented gtImputation, an open-source application programmed in Python3 and with a user-friendly GUI to facilitate the whole process. The method performance was validated by comparing its accuracy, precision and sensitivity on several benchmark genotype datasets with other available imputation algorithms. Our approach produced highly accurate and precise genotype imputations even for SNPs with alleles at low frequency and outperformed other algorithms, especially for datasets from mixed populations with unrelated individuals.

6.
Comput Methods Programs Biomed ; 254: 108308, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38968829

RESUMO

BACKGROUND AND OBJECTIVE: In the field of lung cancer research, particularly in the analysis of overall survival (OS), artificial intelligence (AI) serves crucial roles with specific aims. Given the prevalent issue of missing data in the medical domain, our primary objective is to develop an AI model capable of dynamically handling this missing data. Additionally, we aim to leverage all accessible data, effectively analyzing both uncensored patients who have experienced the event of interest and censored patients who have not, by embedding a specialized technique within our AI model, not commonly utilized in other AI tasks. Through the realization of these objectives, our model aims to provide precise OS predictions for non-small cell lung cancer (NSCLC) patients, thus overcoming these significant challenges. METHODS: We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. More specifically, this model tailors the transformer architecture to tabular data by adapting its feature embedding and masked self-attention to mask missing data and fully exploit the available ones. By making use of ad-hoc designed losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. RESULTS: We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used. CONCLUSIONS: The results show that our model not only outperforms the state-of-the-art's performance but also simplifies the analysis in the presence of missing data, by effectively eliminating the need to identify the most appropriate imputation strategy for predicting OS in NSCLC patients.

7.
Am J Epidemiol ; 2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38904459

RESUMO

When analyzing a selected sample from a general population, selection bias can arise relative to the causal average treatment effect (ATE) for the general population, and also relative to the ATE for the selected sample itself. We provide simple graphical rules that indicate: (1) if a selected-sample analysis will be unbiased for each ATE; (2) whether adjusting for certain covariates could eliminate selection bias. The rules can easily be checked in a standard single-world intervention graph. When the treatment could affect selection, a third estimand of potential scientific interest is the "net treatment difference", namely the net change in outcomes that would occur for the selected sample if all members of the general population were treated versus not treated, including any effects of the treatment on which individuals are in the selected sample . We provide graphical rules for this estimand as well. We decompose bias in a selected-sample analysis relative to the general-population ATE into: (1) "internal bias" relative to the net treatment difference; (2) "net-external bias", a discrepancy between the net treatment difference and the general-population ATE. Each bias can be assessed unambiguously via a distinct graphical rule, providing new conceptual insight into the mechanisms by which certain causal structures produce selection bias.

8.
J R Stat Soc Ser C Appl Stat ; 73(3): 755-773, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38883261

RESUMO

The use of digital devices to collect data in mobile health studies introduces a novel application of time series methods, with the constraint of potential data missing at random or missing not at random (MNAR). In time-series analysis, testing for stationarity is an important preliminary step to inform appropriate subsequent analyses. The Dickey-Fuller test evaluates the null hypothesis of unit root non-stationarity, under no missing data. Beyond recommendations under data missing completely at random for complete case analysis or last observation carry forward imputation, researchers have not extended unit root non-stationarity testing to more complex missing data mechanisms. Multiple imputation with chained equations, Kalman smoothing imputation, and linear interpolation have also been used for time-series data, however such methods impose constraints on the autocorrelation structure and impact unit root testing. We propose maximum likelihood estimation and multiple imputation using state space model approaches to adapt the augmented Dickey-Fuller test to a context with missing data. We further develop sensitivity analyses to examine the impact of MNAR data. We evaluate the performance of existing and proposed methods across missing mechanisms in extensive simulations and in their application to a multi-year smartphone study of bipolar patients.

9.
Drug Alcohol Depend ; 261: 111368, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38896944

RESUMO

BACKGROUND: High levels of missing outcome data for biologically confirmed substance use (BCSU) threaten the validity of substance use disorder (SUD) clinical trials. Underlying attributes of clinical trials could explain BCSU missingness and identify targets for improved trial design. METHODS: We reviewed 21 clinical trials funded by the NIDA National Drug Abuse Treatment Clinical Trials Network (CTN) and published from 2005 to 2018 that examined pharmacologic and psychosocial interventions for SUD. We used configurational analysis-a Boolean algebra approach that identifies an attribute or combination of attributes predictive of an outcome-to identify trial design features and participant characteristics associated with high levels of BCSU missingness. Associations were identified by configuration complexity, consistency, coverage, and robustness. We limited results using a consistency threshold of 0.75 and summarized model fit using the product of consistency and coverage. RESULTS: For trial design features, the final solution consisted of two pathways: psychosocial treatment as a trial intervention OR larger trial arm size (complexity=2, consistency=0.79, coverage=0.93, robustness score=0.71). For participant characteristics, the final solution consisted of two pathways: interventions targeting individuals with poly- or nonspecific substance use OR younger age (complexity=2, consistency=0.75, coverage=0.86, robustness score=1.00). CONCLUSIONS: Psychosocial treatments, larger trial arm size, interventions targeting individuals with poly- or nonspecific substance use, and younger age among trial participants were predictive of missing BCSU data in SUD clinical trials. Interventions to mitigate missing data that focus on these attributes may reduce threats to validity and improve utility of SUD clinical trials.

10.
Popul Health Metr ; 22(1): 13, 2024 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-38886744

RESUMO

OBJECTIVE: To compare how different imputation methods affect the estimates and performance of a prediction model for premature mortality. STUDY DESIGN AND SETTING: Sex-specific Weibull accelerated failure time survival models were run on four separate datasets using complete case, mode, single and multiple imputation to impute missing values. Six performance measures were compared to access predictive accuracy (Nagelkerke R2, integrated brier score), discrimination (Harrell's c-index, discrimination slope) and calibration (calibration in the large, calibration slope). RESULTS: The highest proportion of missingness for a single variable was 10.86% for the female model and 8.24% for the male model. Comparing the performance measures for complete case, mode, single and multiple imputation: the Nagelkerke R2 values for the female model was 0.1084, 0.1116, 0.1120 and 0.111-0.1120 with the male model exhibited similar variation of 0.1050, 0.1078, 0.1078 and 0.1078-0.1081. Harrell's c-index also demonstrated small variation with values of 0.8666, 0.8719, 0.8719 and 0.8711-0.8719 for the female model and 0.8549, 0.8548, 0.8550 and 0.8550-0.8553 for the male model. CONCLUSION: In the scenarios examined in this study, mode imputation performed well when using a population health survey compared to single and multiple imputation when predictive performance measures is the main model goal. To generate unbiased hazard ratios, multiple imputation methods were superior. This study shows the need to consider the best imputation approach for a predictive model development given the conditions of missing data and the goals of the analysis.


Assuntos
Mortalidade Prematura , Humanos , Masculino , Feminino , Modelos Estatísticos , Medição de Risco/métodos , Pessoa de Meia-Idade , Interpretação Estatística de Dados , Adulto
11.
J Exp Anal Behav ; 122(1): 52-61, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38837760

RESUMO

A challenge in carrying out matching analyses is to deal with undefined log ratios. If any reinforcer or response rate equals zero, the logarithm of the ratio is undefined: data are unsuitable for analyses. There have been some tentative solutions, but they had not been thoroughly investigated. The purpose of this article is to assess the adequacy of five treatments: omit undefined ratios, use full information maximum likelihood, replace undefined ratios by the mean divided by 100, replace them by a constant 1/10, and add the constant .50 to ratios. Based on simulations, the treatments are compared on their estimations of variance accounted for, sensitivity, and bias. The results show that full information maximum likelihood and omiting undefined ratios had the best overall performance, with negligibly biased and more accurate estimates than mean divided by 100, constant 1/10, and constant .50. The study suggests that mean divided by 100, constant 1/10, and constant .50 should be avoided and recommends full information maximum likelihood to deal with undefined log ratios in matching analyses.


Assuntos
Reforço Psicológico , Funções Verossimilhança , Animais , Interpretação Estatística de Dados , Condicionamento Operante , Simulação por Computador , Humanos , Esquema de Reforço
12.
Bioengineering (Basel) ; 11(6)2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38927759

RESUMO

This study presents a trial analysis that uses brain activity information obtained from mice to detect rheumatoid arthritis (RA) in its presymptomatic stages. Specifically, we confirmed that F759 mice, serving as a mouse model of RA that is dependent on the inflammatory cytokine IL-6, and healthy wild-type mice can be classified on the basis of brain activity information. We clarified which brain regions are useful for the presymptomatic detection of RA. We introduced a matrix completion-based approach to handle missing brain activity information to perform the aforementioned analysis. In addition, we implemented a canonical correlation-based method capable of analyzing the relationship between various types of brain activity information. This method allowed us to accurately classify F759 and wild-type mice, thereby identifying essential features, including crucial brain regions, for the presymptomatic detection of RA. Our experiment obtained brain activity information from 15 F759 and 10 wild-type mice and analyzed the acquired data. By employing four types of classifiers, our experimental results show that the thalamus and periaqueductal gray are effective for the classification task. Furthermore, we confirmed that classification performance was maximized when seven brain regions were used, excluding the electromyogram and nucleus accumbens.

13.
Bioengineering (Basel) ; 11(6)2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38927796

RESUMO

Motion capture (MoCap) technology, essential for biomechanics and motion analysis, faces challenges from data loss due to occlusions and technical issues. Traditional recovery methods, based on inter-marker relationships or independent marker treatment, have limitations. This study introduces a novel U-net-inspired bi-directional long short-term memory (U-Bi-LSTM) autoencoder-based technique for recovering missing MoCap data across multi-camera setups. Leveraging multi-camera and triangulated 3D data, this method employs a sophisticated U-shaped deep learning structure with an adaptive Huber regression layer, enhancing outlier robustness and minimizing reconstruction errors, proving particularly beneficial for long-term data loss scenarios. Our approach surpasses traditional piecewise cubic spline and state-of-the-art sparse low rank methods, demonstrating statistically significant improvements in reconstruction error across various gap lengths and numbers. This research not only advances the technical capabilities of MoCap systems but also enriches the analytical tools available for biomechanical research, offering new possibilities for enhancing athletic performance, optimizing rehabilitation protocols, and developing personalized treatment plans based on precise biomechanical data.

14.
Qual Life Res ; 2024 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-38879861

RESUMO

PURPOSE: Non-response (NR) to patient-reported outcome (PRO) questionnaires may cause bias if not handled appropriately. Collecting reasons for NR is recommended, but how reasons for NR are related to missing data mechanisms remains unexplored. We aimed to explore this relationship for intermittent NRs. METHODS: Patients with multiple myeloma completed validated PRO questionnaires at enrolment and 12 follow-up time-points. NR was defined as non-completion of a follow-up assessment within seven days, which triggered contact with the patient, recording the reason for missingness and an invitation to complete the questionnaire (denoted "salvage response"). Mean differences between salvage and previous on-time scores were estimated for groups defined by reasons for NR using linear regression with clustered standard errors. Statistically significant mean differences larger than minimal important difference thresholds were interpreted as "missing not at random" (MNAR) mechanism (i.e. assumed to be related to declining health), and the remainder interpreted as aligned with "missing completely at random" (MCAR) mechanism (i.e. assumed unrelated to changes in health). RESULTS: Most (7228/7534 (96%)) follow-up questionnaires were completed; 11% (802/7534) were salvage responses. Mean salvage scores were compared to previous on-time scores by reason: those due to hospital admission, mental or physical reasons were worse in 10/22 PRO domains; those due to technical difficulties/procedural errors were no different in 21/22 PRO domains; and those due to overlooked/forgotten or other/unspecified reasons were no different in any domains. CONCLUSION: Intermittent NRs due to hospital admission, mental or physical reasons were aligned with MNAR mechanism for nearly half of PRO domains, while intermittent NRs due to technical difficulties/procedural errors or other/unspecified reasons generally were aligned with MCAR mechanism.

15.
Clin Infect Dis ; 2024 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-38824440

RESUMO

Data on alcohol use and incident Tuberculosis (TB) infection are needed. In adults aged 15+ in rural Uganda (N=49,585), estimated risk of incident TB infection was 29.2% with alcohol use vs. 19.2% without (RR: 1.49; 95%CI: 1.40-1.60). There is potential for interventions to interrupt transmission among people who drink alcohol.

16.
J Invest Dermatol ; 2024 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-38888525

RESUMO

Hidradenitis suppurativa (HS) is a complex inflammatory skin condition affecting 0.1-4% of the population that leads to permanent scarring in the axilla, inframammary region, groin, and buttocks. Its complex pathogenesis involves genetics, innate and adaptive immunity, microbiota, and environmental stimuli. Specific populations have a higher incidence of HS, including females and Black individuals and those with associated comorbidities. HS registries and biobanks have set standards for the documentation of clinical data in the context of clinical trials and outcomes research, but collection, documentation, and reporting of these important clinical and demographic variables are uncommon in HS laboratory research studies. Standardization in the laboratory setting is needed because it helps to elucidate the factors that contribute mechanistically to HS symptoms and pathophysiology. The purpose of this article is to begin to set the stage for standardized reporting in the laboratory setting. We discuss how clinical guidelines can inform laboratory research studies, and we highlight what additional information is necessary for the use of samples in the wet laboratory and interpretation of associated mechanistic data. Through standardized data collection and reporting, data harmonization between research studies will transform our understanding of HS and lead to novel discoveries that will positively impact patient care.

17.
Artigo em Inglês | MEDLINE | ID: mdl-38897847

RESUMO

In 2020, the NIH and FDA issued guidance documents that laid the foundation for human subject research during an unprecedented pandemic. To bridge these general considerations to actual applications in cardiovascular interventional device trials, the PAndemic Impact on INTErventional device ReSearch (PAIINTERS) Working Group was formed in early 2021 under the Predictable And Sustainable Implementation Of National CardioVascular Registries (PASSION CV Registries). The PAIINTER's Part I report, published by Rymer et al. [5], provided a comprehensive overview of the operational impact on interventional studies during the first year of the Pandemic. PAIINTERS Part II focused on potential statistical issues related to bias, variability, missing data, and study power when interventional studies may start and end in different pandemic phases. Importantly, the paper also offers practical mitigation strategies to adjust or minimize the impact for both SATs and RCTs, providing a valuable resource for researchers and professionals involved in cardiovascular clinical trials.

18.
Contemp Clin Trials ; 143: 107602, 2024 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-38857674

RESUMO

BACKGROUND: Missing outcome data is common in trials, and robust methods to address this are needed. Most trial reports currently use methods applicable under a missing completely at random assumption (MCAR), although this strong assumption can often be inappropriate. OBJECTIVE: To identify and summarise current literature on the analytical methods for handling missing outcome data in randomised controlled trials (RCTs), emphasising methods appropriate for data missing at random (MAR) or missing not at random (MNAR). STUDY DESIGN AND SETTING: We conducted a methodological scoping review and identified papers through searching four databases (MEDLINE, Embase, CENTRAL, and CINAHL) from January 2015 to March 2023. We also performed forward and backward citation searching. Eligible papers discussed methods or frameworks for handling missing outcome data in RCTs or simulation studies with an RCT design. RESULTS: From 1878 records screened, our search identified 101 eligible papers. 90 (89%) papers described specific methods for addressing missing outcome data and 11 (11%) described frameworks for overall methodological approach. Of the 90 methods papers, 30 (33%) described methods under the MAR assumption, 48 (53%) explored methods under the MNAR assumption and 11 (12%) discussed methods under a hybrid of MAR and MNAR assumptions. Control-based methods under the MNAR assumption were the most common method explored, followed by multiple imputation under the MAR assumption. CONCLUSION: This review provides guidance on available analytic approaches for handling missing outcome data, particularly under the MNAR assumption. These findings may support trialists in using appropriate methods to address missing outcome data.

19.
Digit Health ; 10: 20552076241249631, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38698826

RESUMO

Background: Micro-randomized trials (MRTs) enhance the effects of mHealth by determining the optimal components, timings, and frequency of interventions. Appropriate handling of missing values is crucial in clinical research; however, it remains insufficiently explored in the context of MRTs. Our study aimed to investigate appropriate methods for missing data in simple MRTs with uniform intervention randomization and no time-dependent covariates. We focused on outcome missing data depending on the participants' background factors. Methods: We evaluated the performance of the available data analysis (AD) and the multiple imputation in generalized estimating equations (GEE) and random effects model (RE) through simulations. The scenarios were examined based on the presence of unmeasured background factors and the presence of interaction effects. We conducted the regression and propensity score methods as multiple imputation. These missing data handling methods were also applied to actual MRT data. Results: Without the interaction effect, AD was biased for GEE, but there was almost no bias for RE. With the interaction effect, estimates were biased for both. For multiple imputation, regression methods estimated without bias when the imputation models were correct, but bias occurred when the models were incorrect. However, this bias was reduced by including the random effects in the imputation model. In the propensity score method, bias occurred even when the missing probability model was correct. Conclusions: Without the interaction effect, AD of RE was preferable. When employing GEE or anticipating interactions, we recommend the multiple imputation, especially with regression methods, including individual-level random effects.

20.
BMC Med Res Methodol ; 24(1): 104, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38702599

RESUMO

BACKGROUND: Patient-Reported Outcome Measures (PROM) provide important information, however, missing PROM data threaten the interpretability and generalizability of findings by introducing potential bias. This study aims to provide insight into missingness mechanisms and inform future researchers on generalizability and possible methodological solutions to overcome missing PROM data problems during data collection and statistical analyses. METHODS: We identified 10,236 colorectal cancer survivors (CRCs) above 18y, diagnosed between 2014 and 2018 through the Danish Clinical Registries. We invited a random 20% (2,097) to participate in a national survey in May 2023. We distributed reminder e-mails at day 10 and day 20, and compared Initial Responders (response day 0-9), Subsequent Responders (response day 10-28) and Non-responders (no response after 28 days) in demographic and cancer-related characteristics and PROM-scores using linear regression. RESULTS: Of the 2,097 CRCs, 1,188 responded (57%). Of these, 142 (7%) were excluded leaving 1,955 eligible CRCs. 628 (32%) were categorized as initial responders, 418 (21%) as subsequent responders, and 909 (47%) as non-responders. Differences in demographic and cancer-related characteristics between the three groups were minor and PROM-scores only marginally differed between initial and subsequent responders. CONCLUSION: In this study of long-term colorectal cancer survivors, we showed that initial responders, subsequent responders, and non-responders exhibit comparable demographic and cancer-related characteristics. Among respondents, Patient-Reported Outcome Measures were also similar, indicating generalizability. Assuming Patient-Reported Outcome Measures of subsequent responders represent answers by the non-responders (would they be available), it may be reasonable to judge the missingness mechanism as Missing Completely At Random.


Assuntos
Sobreviventes de Câncer , Neoplasias Colorretais , Medidas de Resultados Relatados pelo Paciente , Humanos , Neoplasias Colorretais/terapia , Feminino , Masculino , Sobreviventes de Câncer/estatística & dados numéricos , Idoso , Pessoa de Meia-Idade , Dinamarca , Inquéritos e Questionários , Sistema de Registros/estatística & dados numéricos , Adulto , Qualidade de Vida , Idoso de 80 Anos ou mais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA