RESUMEN
PURPOSE: Determining if group-level differences in health outcomes are meaningful has recently been neglected in favour of determining if individuals have experienced a meaningful change. We explore interpretation of a meaningful between-group difference (MBGD) in clinical outcome assessment scores, primarily in the context of randomized clinical trials. METHODS: We constructed a series of possible 'viewpoints' on how to conceptualize MBGD thresholds. Each viewpoint is discussed critically in terms of potential advantages and disadvantages, with simulated data to facilitate their consideration. RESULTS: Five viewpoints are presented and discussed. The first considers whether thresholds for meaningful within-individual change over time can be equally applied at the group-level, which is shown to be untenable. Viewpoints 2-4 consider what would have to be observed in treatment groups to conclude a meaningful between-group difference has occurred, framed in terms of the proportion of patients perceiving that they had meaningfully improved. The final viewpoint considers an alternative framework where stakeholders are directly questioned on the meaningfulness of varying magnitudes of between-group differences. The choice of a single threshold versus general interpretative guidelines is discussed. CONCLUSION: There does not appear to be a single method with clear face validity for determining MBGD thresholds. Additionally, the notion that such thresholds can be purely data-driven is challenged, where a degree of subjective stakeholder judgement is likely required. Areas for future research are proposed, to move towards robust method development.
RESUMEN
PURPOSE: The minimal important change (MIC) is defined as the smallest within-individual change in a patient-reported outcome measure (PROM) that patients on average perceive as important. We describe a method to estimate this value based on longitudinal confirmatory factor analysis (LCFA). The method is evaluated and compared with a recently published method based on longitudinal item response theory (LIRT) in simulated and real data. We also examined the effect of sample size on bias and precision of the estimate. METHODS: We simulated 108 samples with various characteristics in which the true MIC was simulated as the mean of individual MICs, and estimated MICs based on LCFA and LIRT. Additionally, both MICs were estimated in existing PROMIS Pain Behavior data from 909 patients. In another set of 3888 simulated samples with sample sizes of 125, 250, 500, and 1000, we estimated LCFA-based MICs. RESULTS: The MIC was equally well recovered with the LCFA-method as using the LIRT-method, but the LCFA analyses were more than 50 times faster. In the Pain Behavior data (with higher scores indicating more pain behavior), an LCFA-based MIC for improvement was estimated to be 2.85 points (on a simple sum scale ranging 14-42), whereas the LIRT-based MIC was estimated to be 2.60. The sample size simulations showed that smaller sample sizes decreased the precision of the LCFA-based MIC and increased the risk of model non-convergence. CONCLUSION: The MIC can accurately be estimated using LCFA, but sample sizes need to be preferably greater than 125.
Asunto(s)
Medición de Resultados Informados por el Paciente , Calidad de Vida , Humanos , Calidad de Vida/psicología , DolorRESUMEN
PURPOSE: The minimal important change (MIC) in a patient-reported outcome measure is often estimated using patient-reported transition ratings as anchor. However, transition ratings are often more heavily weighted by the follow-up state than by the baseline state, a phenomenon known as "present state bias" (PSB). It is unknown if and how PSB affects the estimation of MICs using various methods. METHODS: We simulated 3240 samples in which the true MIC was simulated as the mean of individual MICs, and PSB was created by basing transition ratings on a "weighted change", differentially weighting baseline and follow-up states. In each sample we estimated MICs based on the following methods: mean change (MC), receiver operating characteristic (ROC) analysis, predictive modeling (PM), adjusted predictive modeling (APM), longitudinal item response theory (LIRT), and longitudinal confirmatory factor analysis (LCFA). The latter two MICs were estimated with and without constraints on the transition item slope parameters (LIRT) or factor loadings (LCFA). RESULTS: PSB did not affect MIC estimates based on MC, ROC, and PM but these methods were biased by other factors. PSB caused imprecision in the MIC estimates based on APM, LIRT and LCFA with constraints, if the degree of PSB was substantial. However, the unconstrained LIRT- and LCFA-based MICs recovered the true MIC without bias and with high precision, independent of the degree of PSB. CONCLUSION: We recommend the unconstrained LIRT- and LCFA-based MIC methods to estimate anchor-based MICs, irrespective of the degree of PSB. The APM-method is a feasible alternative if PSB is limited.
RESUMEN
PURPOSE: Thresholds for meaningful within-individual change (MWIC) are useful for interpreting patient-reported outcome measures (PROM). Transition ratings (TR) have been recommended as anchors to establish MWIC. Traditional statistical methods for analyzing MWIC such as mean change analysis, receiver operating characteristic (ROC) analysis, and predictive modeling ignore problems of floor/ceiling effects and measurement error in the PROM scores and the TR item. We present a novel approach to MWIC estimation for multi-item scales using longitudinal item response theory (LIRT). METHODS: A Graded Response LIRT model for baseline and follow-up PROM data was expanded to include a TR item measuring latent change. The LIRT threshold parameter for the TR established the MWIC threshold on the latent metric, from which the observed PROM score MWIC threshold was estimated. We compared the LIRT approach and traditional methods using an example data set with baseline and three follow-up assessments differing by magnitude of score improvement, variance of score improvement, and baseline-follow-up score correlation. RESULTS: The LIRT model provided good fit to the data. LIRT estimates of observed PROM MWIC varied between 3 and 4 points score improvement. In contrast, results from traditional methods varied from 2 to 10 points-strongly associated with proportion of self-rated improvement. Best agreement between methods was seen when approximately 50% rated their health as improved. CONCLUSION: Results from traditional analyses of anchor-based MWIC are impacted by study conditions. LIRT constitutes a promising and more robust analytic approach to identifying thresholds for MWIC.
Asunto(s)
Calidad de Vida , Humanos , Calidad de Vida/psicología , Curva ROC , Sistema de RegistrosRESUMEN
OBJECTIVES: There is growing interest in condition-specific preference measures, including the European Organisation for Research and Treatment of Cancer Quality of Life Utility Measure-Core 10 Dimensions (QLU-C10D). This research assessed the implications of using utility indices on the basis of the EQ-5D-3L, a mapping of EQ-5D-3L to the EQ-5D-5L, and the QLU-C10D, and compared their psychometric properties. METHODS: Data were taken from 8 phase 3 randomized controlled trials of nivolumab with or without ipilimumab for the treatment of solid tumors. Utilities for progression-related states were calculated using the UK and English value sets and incremental quality-adjusted life-years (QALYs) derived from established UK cost-effectiveness models. The psychometric properties of the utility indices were assessed using pooled trial data. RESULTS: Compared with the EQ-5D-3L index, the mapped EQ-5D-5L index yielded an average of 6% more and the QLU-C10D index an average of 2% fewer incremental QALYs for nivolumab versus comparators. All indices could differentiate between groups defined by performance status, cancer stage, or self-reported health status at baseline and detect meaningful changes in performance status, tumor response, health status, and quality of life over approximately 12 weeks of treatment. CONCLUSIONS: The lower QALY yield of the QLU-C10D was balanced by evidence of greater validity and responsiveness. Benefits gained from using the QLU-C10D may be apparent when treatments affect targeted symptoms and functional aspects, including sleep, bowel function, appetite, nausea, and fatigue. The observed differences in QALYs may not be sufficiently large to affect health technology assessment decisions.
Asunto(s)
Antineoplásicos Inmunológicos/uso terapéutico , Estado de Salud , Neoplasias , Nivolumab/uso terapéutico , Calidad de Vida , Encuestas y Cuestionarios , Ensayos Clínicos como Asunto , Años de Vida Ajustados por Calidad de Vida , Ensayos Clínicos Controlados Aleatorios como AsuntoRESUMEN
RATIONALE: Study 311 (E2007-G000-311; NCT02849626) was a Phase 3, multicenter, open-label single-arm study of adjunctive perampanel oral suspension in pediatric patients (aged 4 to <12â¯years) with partial-onset seizures (POS) (with/without secondarily generalized tonic-clonic seizures [SGTCS]) or primary generalized tonic-clonic seizures (PGTCS). Health-related quality of life (HRQoL) was an exploratory endpoint initially analyzed through simple descriptive summaries. The aim of this post hoc analysis was to provide a more thorough assessment of HRQoL. METHODS: This analysis focused on EQ-5D-Y data collected at Baseline, Week 23, and Week 52. Individual dimensions, visual analog scale (VAS) and summed misery index (MI) were evaluated at all visits and compared by seizure type (POS versus SGTCS versus PGTCS), age (4 to <7 versus 7 to <12), and use of concomitant enzyme-inducing antiepileptic drugs (EIAEDs) (yes versus no). Paretian Classification of Health Change (PCHC) analysis summarized the proportion of patients who showed improvement or deterioration in HRQoL. Waterfall plots assessed changes in EQ-5D-Y scores by treatment-emergent adverse events (TEAEs) and by reduction in seizure frequency. Health state utility values associated with differing seizure frequency states were estimated using a linear mixed model. RESULTS: One hundred and fifteen patients completed EQ-5D-Y at relevant study visits (Seizure type: POS nâ¯=â¯84 [of which 21 had SGTCS], PGTCS nâ¯=â¯31; Age: 4 to <7â¯years nâ¯=â¯30, 7 to <12â¯years nâ¯=â¯85; Concomitant EIAEDs: Yes nâ¯=â¯35, No nâ¯=â¯80). Completion rates out of those expected to complete EQ-5D-Y were high at both timepoints (84.4% at Week 23 and 97.2% at Week 52). Overall, VAS/MI remained stable over time (did not exceed minimal important difference); this was similar according to seizure type, age, and EIAED usage. In patients with 'no problems' on any EQ-5D-Y dimension at Baseline, nearly all retained their full health at Week 23 (94.7%), and all retained it at Week 52 (100.0%). PCHC analysis showed fewer patients with POS experienced deterioration in EQ-5D-Y than patients with PGTCS at Week 23 (24.1% versus 42.1%). Not experiencing a TEAE, or remaining seizure-free, was associated with improvements in VAS score at Week 23 compared to those experiencing TEAEs or seizures, respectively. Health state utility values (HSUVs) were estimated as follows: seizure free (LS Mean 0.914 [95% CIs 0.587, 1.240]), ≥1 seizure per year (0.620 [0.506, 0.734]), ≥1 seizure per month (0.596 [0.338, 0.855]), ≥1 seizure per week (0.284 [-0.014, 0.582]). CONCLUSIONS: An in-depth analysis of EQ-5D-Y data allowed for a more nuanced exploration of HRQoL than previous descriptive summaries. Our findings provide evidence that perampanel as adjunctive therapy did not result in deterioration of patient HRQoL. The association between TEAEs or remaining seizure-free and HRQoL warrants further exploration. Increasing seizure frequency was associated with decreasing HSUVs; these can inform cost-effectiveness modeling of perampanel and other therapies aiming to reduce seizure frequency in pediatric patients.
Asunto(s)
Calidad de Vida , Convulsiones , Anticonvulsivantes/uso terapéutico , Niño , Preescolar , Quimioterapia Combinada , Humanos , Nitrilos , Piridonas , Convulsiones/tratamiento farmacológico , Resultado del TratamientoRESUMEN
BACKGROUND: The Endometriosis Symptom Diary (ESD) and Endometriosis Impact Scale (EIS) are patient-reported outcome measures developed to evaluate efficacy in clinical trials and clinical practice. The ESD is a daily electronic diary assessing symptom severity; the EIS is a weekly electronic diary assessing symptom impact. This study explored the importance of symptoms (ESD items) and impacts (EIS domains), perspectives on scoring algorithms, and clinically important difference (CID) thresholds to inform clinical trial score interpretation. METHODS: Endometriosis patients in Germany (n = 8) and the US (n = 17), and expert clinicians (n = 4) in Germany, the US, Spain, and Finland participated in semi-structured qualitative interviews comprising structured tasks. Interview transcripts were analyzed using thematic analysis techniques. RESULTS: Quality and severity of endometriosis-associated pelvic pain varied considerably among patients; some experienced pelvic pain daily, others during menstrual bleeding (dysmenorrhea) only. Patients and clinicians ranked "worst pelvic pain" as the most meaningful pain concept assessed by the ESD, followed by constant and short-term pelvic pain. Preferences for summarizing daily pain scores over the 28-day menstrual cycle depended on individuals' experience of pain: patients experiencing pain daily preferred scores summarizing data for all 28 days; patients primarily experiencing pain during selected days, and their treating clinicians preferred scores based on the most severe pain days. Initial CID exploration for the "worst pelvic pain" 0-10 numerical rating scale (0-10 NRS) revealed that, for most patients, a 2- or 3-point reduction was considered meaningful, depending on baseline severity. Patients and clinicians ranked "emotional well-being" and "limitations in physical activities" as the most important EIS domains. CONCLUSIONS: This study informs the use of the ESD and EIS as clinically relevant measures of endometriosis symptoms and their impact. Findings from the ESD highlight the importance of individual-patient assessment of pain experience and identify "worst pelvic pain" as the most meaningful symptom assessed. Aggregating scores over the 28-day menstrual cycle may inform meaningful endpoints for clinical trials. Diverse EIS concepts (e.g. impact on emotional well-being and physical activities) are meaningful to patients and clinicians, emphasizing the importance of evaluating the impact on both to comprehensively assess treatment efficacy and decisions. TRIAL REGISTRATION: Not applicable. Qualitative, non-interventional study; registration not required.
Asunto(s)
Endometriosis/psicología , Registros Médicos , Dimensión del Dolor/métodos , Medición de Resultados Informados por el Paciente , Adulto , Dismenorrea/psicología , Femenino , Finlandia/epidemiología , Alemania/epidemiología , Humanos , Masculino , Persona de Mediana Edad , Dolor Pélvico/psicología , Investigación Cualitativa , España/epidemiología , Evaluación de Síntomas/psicología , Resultado del Tratamiento , Estados Unidos/epidemiologíaRESUMEN
Objective: To evaluate psychometric performance of the NCCN-FACT Ovarian Cancer Symptom Index-18 (NFOSI-18) in advanced ovarian cancer. Methods: Cross-sectional, observational data from patients receiving treatment for ovarian cancer. Other measures included European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire core (EORTC QLQ-C30) and associated ovarian cancer module (EORTC QLQ-OV28) and Work Productivity and Activity Impairment. Internal consistency reliability, construct validity and anchor-based clinically important differences were assessed. Results: 897 patients were analyzed. Reliability was acceptable for all NFOSI-18 scores; construct validity was supported. Twelve anchors sufficiently correlated with NFOSI-18 scores and suggested clinically important differences: NFOSI-18 total score (5-7), disease-related symptoms - physical (3-4), disease-related symptoms - emotional (1), treatment side effects (2) and functional well-being (1-2). Conclusions: Results provide evidence of reliability and validity of NFOSI-18 scores. Generated CIDs will help improve interpretation of between-group treatment differences in clinical trials.
Lay abstract The National Comprehensive Cancer Network Functional Assessment of Cancer Therapy Ovarian Cancer Symptom Index-18 (NFOSI-18) is a questionnaire assessing the health of patients with ovarian cancer. When using such questionnaires, it is important to evidence that they produce consistent scores (referred to as reliability) and are aligned with other assessments of health (referred to as construct validity). It is also important to set guidelines on what constitutes a clinically important difference in scores, so clinicians and researchers can judge how effective new treatments are. This study analyzed data from 897 patients with advanced ovarian cancer, providing evidence of reliability and construct validity. Guidelines for clinically important differences were also provided. The findings support continued use of the NFOSI-18.
Asunto(s)
Neoplasias Ováricas/psicología , Adulto , Anciano , Anciano de 80 o más Años , Estudios Transversales , Femenino , Humanos , Persona de Mediana Edad , Neoplasias Ováricas/terapia , Medición de Resultados Informados por el Paciente , Psicometría , Reproducibilidad de los ResultadosRESUMEN
PURPOSE: The notion of what constitutes meaningful differences or changes in patient-reported outcome scores is represented by meaningful change thresholds (MCTs). Applying multiple methods to estimate MCTs inevitably results in a range of estimates; however, a single estimate or small range is sought in practice to enable consistent interpretation of scores. While current recommendations for triangulation are appropriate in principle, the vital step of moving from all estimates to a value or small range lacks clarity and is subjective in nature. This article aims to review current triangulation approaches and provide more robust recommendations than what is currently available. METHODS: Current approaches to perform triangulation are described and discussed. Anchor-based estimates are focussed upon due to their recognition as the most valid and developed approach. Recommendations for triangulation are provided. RESULTS: A correlation-weighted average of MCT estimates is recommended to triangulate multiple MCT estimates derived from a single study into a single value, where increased weighting is given to stronger anchor measures. The choice of method to triangulate estimates from several published studies is highly dependent on the availability of information within the publications. MCTs designed for between-group differences, within-group changes, and within-individual changes should be considered separately. CONCLUSION: The recommendations within this article provide a reliable and transparent approach to triangulation when a single value is sought, based on meta-analytic approaches. This approach is preferable to a simple mean of estimates where all are weighted equally, or through 'eyeballing' plotted estimates which is unreliable. We encourage researchers to adopt these methods, but to remain aware of the limitations within each method and further nuances in study design that result in heterogeneity. Sensitivity analyses with a range of plausible values are encouraged; however, the recommendations provide a suitable starting value for inferences. Unresolved issues in triangulation, requiring further exploration, are highlighted.
Asunto(s)
Medición de Resultados Informados por el Paciente , Calidad de Vida , Humanos , Calidad de Vida/psicología , Proyectos de InvestigaciónRESUMEN
BACKGROUND: Defining the transition from relapsing-remitting multiple sclerosis (RRMS) to secondary progressive multiple sclerosis (SPMS) can be challenging and delayed. A digital tool (MSProDiscuss) was developed to facilitate physician-patient discussion in evaluating early, subtle signs of multiple sclerosis (MS) disease progression representing this transition. OBJECTIVE: This study aimed to determine cut-off values and corresponding sensitivity and specificity for predefined scoring algorithms, with or without including Expanded Disability Status Scale (EDSS) scores, to differentiate between RRMS and SPMS patients and to evaluate psychometric properties. METHODS: Experienced neurologists completed the tool for patients with confirmed RRMS or SPMS and those suspected to be transitioning to SPMS. In addition to age and EDSS score, each patient's current disease status (disease activity, symptoms, and its impacts on daily life) was collected while completing the draft tool. Receiver operating characteristic (ROC) curves determined optimal cut-off values (sensitivity and specificity) for the classification of RRMS and SPMS. RESULTS: Twenty neurologists completed the draft tool for 198 patients. Mean scores for patients with RRMS (n=89), transitioning to SPMS (n=47), and SPMS (n=62) were 38.1 (SD 12.5), 55.2 (SD 11.1), and 69.6 (SD 12.0), respectively (P<.001, each between-groups comparison). Area under the ROC curve (AUC) including and excluding EDSS were for RRMS (including) AUC 0.91, 95% CI 0.87-0.95, RRMS (excluding) AUC 0.88, 95% CI 0.84-0.93, SPMS (including) AUC 0.91, 95% CI 0.86-0.95, and SPMS (excluding) AUC 0.86, 95% CI 0.81-0.91. In the algorithm with EDSS, the optimal cut-off values were ≤51.6 for RRMS patients (sensitivity=0.83; specificity=0.82) and ≥58.9 for SPMS patients (sensitivity=0.82; specificity=0.84). The optimal cut-offs without EDSS were ≤46.3 and ≥57.8 and resulted in similar high sensitivity and specificity (0.76-0.86). The draft tool showed excellent interrater reliability (intraclass correlation coefficient=.95). CONCLUSIONS: The MSProDiscuss tool differentiated RRMS patients from SPMS patients with high sensitivity and specificity. In clinical practice, it may be a useful tool to evaluate early, subtle signs of MS disease progression indicating the evolution of RRMS to SPMS. MSProDiscuss will help assess the current level of progression in an individual patient and facilitate a more informed physician-patient discussion.
Asunto(s)
Esclerosis Múltiple/diagnóstico , Telemedicina/métodos , Adulto , Progresión de la Enfermedad , Femenino , Humanos , Masculino , Médicos , Reproducibilidad de los ResultadosRESUMEN
OBJECTIVES: Thresholds for the minimally important difference (MID) or responder definition (RD) in health-related quality-of-life (HRQoL) scores are required to interpret the impact of an intervention or change in the trajectory of the condition which is meaningful to patients. This study aimed to establish MID and RD for the European Organisation for Research and Treatment of Cancer Quality of Life Multiple Myeloma questionnaire (EORTC QLQ-MY20). METHODS: A novel mixed-methods approach was applied by utilizing both existing clinical trial data and prospective patient interviews. Anchor-based, distribution-based, and qualitative-based estimates of meaningful change were triangulated to form recommended RDs for each scale of the EORTC QLQ-MY20. Anchor-based MIDs were summarized using weighted correlation. RESULTS: Recommended MIDs were as follows: Disease Symptoms (DS 10 points), Side Effects of Treatment (SE 10 points), Body Image (BI 13 points), and Future Perspective (FP 9 points). Recommended RDs were as follows: DS (16 improvement; 11 worsening), SE (6 improvement; 9 worsening), BI (33 improvement; 33 worsening), and FP (11 improvement; 11 worsening). CONCLUSIONS: The study generated estimates of the MID and RD for each scale of the EORTC QLQ-MY20. Published estimates will enable investigators and clinicians to adopt these as standard for interpretation and for hypothesis testing. Consequently, analyses from trials of different interventions can be more comparable.
Asunto(s)
Mieloma Múltiple/terapia , Calidad de Vida , Encuestas y Cuestionarios , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios ProspectivosRESUMEN
PURPOSE: Pancreatic cancer and its treatments impact patients' symptoms, functioning, and quality of life. Content-valid patient-reported outcome (PRO) instruments are required to assess outcomes in clinical trials. This study aimed to: (a) conceptualise the patient experience of pancreatic cancer; (b) identify relevant PRO instruments; (c) review the content validity of mapped instruments to guide PRO measurement in clinical trials. METHODS: Qualitative literature and interviews with clinicians and patients were analysed thematically to develop a conceptual model of patient experience. PRO instruments were reviewed against the conceptual model to identify gaps in measurement. Cognitive debriefing explored PRO conceptual relevance and patients' understanding. RESULTS: Patients in the USA (N = 24, aged 35-84) and six clinicians (from US and Europe) were interviewed. Pre-diagnosis, pain was the most frequently reported symptom (N = 21). Treatments included surgery, radiation, chemotherapy, and immunotherapy. Surgery was associated with acute pain and gastrointestinal symptoms. Chemotherapy/chemoradiation side effects were cyclical and included fatigue/tiredness (N = 21), appetite loss (N = 15), bowel problems (N = 15), and nausea/vomiting (N = 15). Patients' functioning and well-being were impaired. The literature review identified 49 PRO measures; the EORTC QLQ-C30/PAN26 were used most frequently and mapped with interview concepts. Patients found the EORTC QLQ-C30/PAN26 to be understandable and relevant; neuropathic side effects were suggested additions. CONCLUSIONS: This is the first study to develop a conceptual model of patients' experience of metastatic/recurrent pancreatic cancer and explore the content validity of the EORTC QLQ-C30/PAN26 following therapeutic advances. The EORTC QLQ-C30/PAN26 appears conceptually relevant; additional items to assess neuropathic side effects are recommended. A recall period should be stated throughout to standardise responses.
Asunto(s)
Neoplasias Pancreáticas/epidemiología , Neoplasias Pancreáticas/psicología , Medición de Resultados Informados por el Paciente , Calidad de Vida/psicología , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Persona de Mediana Edad , Metástasis de la Neoplasia , Encuestas y CuestionariosRESUMEN
BACKGROUND: Infertility has a negative impact on quality of life (QoL) and well-being of affected individuals and couples. A variety of patient-reported outcome (PRO) measures to assess infertility-related QoL are available; however, there is a concern regarding potential issues with their development methodology, validation and use. This review aimed to i) identify PRO measures used in infertility interventional studies ii) assess validation evidence to identify a reliable, valid PRO measure to assess changes in QoL or treatment satisfaction in clinical studies with female patients following treatment with novel therapies iii) identify potential gaps in evidence for validity. METHODS: A structured literature search of Medline, Embase, and the Cochrane Library (accessed in September 2015) was conducted using pre-defined search terms. The identified publications were reviewed applying eligibility criteria to select interventional female infertility studies using PROs. Infertility-specific PRO measures assessing QoL, treatment satisfaction or psychiatric health, and included in studies by ≥2 research groups were selected and critically reviewed in light of scientific and regulatory guidance (e.g. FDA PRO Guidance for Industry) for evidence of content validity, psychometric strength, and patient acceptability. RESULTS: The literature search and hand-searching yielded 122 publications; 78 unique PRO measures assessing QoL, treatment satisfaction or psychiatric health were identified. Five PRO measures met the selection criteria for detailed review: Fertility Quality of Life (FertiQoL); Fertility Problem Inventory (FPI); Fertility Problem Stress (FPS); Infertility Questionnaire (IFQ); Illness Cognitions Questionnaire adapted for Infertility (ICQ-I). None of the PRO measures met all validation criteria. The FertiQoL was the most widely used infertility-specific PRO measure to assess QoL in interventional studies, with reasonable evidence for adequate content validity, psychometric strength, and linguistic validation. However, gaps in evidence remain including test-retest reliability and thresholds for interpreting clinically important changes. While the FPI demonstrated reasonable evidence for content and psychometric validity, its utility as an outcome measure is limited by a lack of recall period. CONCLUSION: The FertiQoL and the FPI are potentially useful measures of infertility-related QoL in interventional studies. Further research is recommended to address gaps in evidence and confirm both PRO measures as reliable assessments of patient outcomes.
Asunto(s)
Infertilidad Femenina/psicología , Medición de Resultados Informados por el Paciente , Calidad de Vida/psicología , Adulto , Femenino , Fertilidad , Humanos , Psicometría , Reproducibilidad de los Resultados , Encuestas y CuestionariosRESUMEN
BACKGROUND: Medication non-adherence is a common issue in chronic illness. The World Health Organization has recognized a need for a valid and reliable method of measuring adherence to understand and mitigate non-adherence. This study aimed to psychometrically evaluate the English version of the Adelphi Adherence Questionnaire (ADAQ©), a questionnaire designed to assess patient-reported medication adherence across multiple therapy areas, in patients with Osteoarthritis (OA). METHODOLOGY: Data from the Adelphi OA Disease Specific Programme™, a survey of physicians and their consulting adult patients with OA conducted in the United States, November 2020 to March 2021, was used to assess the psychometric properties of the ADAQ. Patients completed the ADAQ, Adherence to Refills and Medication Scale (ARMS), Western Ontario and McMaster Universities Arthritis Index (WOMAC), and EQ-5D-3L. The measurement model of the 13-item ADAQ was assessed and refined using latent variable modelling (Multiple Indicator Multiple Cause, confirmatory and exploratory factor analyses, item response theory, Mokken scaling, and bifactor analyses). Correlational analyses (Spearman's rank and polyserial as appropriate) with ARMS, WOMAC, and EQ-5D-3L scores assessed construct validity. Anchor- and distribution-based analyses were performed to estimate between-group clinically important differences (CID). RESULTS: Overall, 723 patients were included in this analysis (54.5% female, 69.0% aged ≥ 60). Latent variable modelling indicated a unidimensional reflective model was appropriate, with a bifactor model confirming an 11-item essentially unidimensional score. Items 12 and 13 were excluded from scoring as they measured a different concept. The ADAQ had high internal reliability with omega hierarchical and Cronbach's alpha coefficients of 0.89 and 0.97, respectively. Convergent validity was supported by moderate correlations with items of the ARMS, and physician-reported adherence and compliance. Mean differences in ADAQ score between high and low adherence groups yielded CID estimates between 0.49 and 1.05 points, with a correlation-weighted average of 0.81 points. CONCLUSION: This scoring model showed strong construct validity and internal consistency reliability when assessing medication adherence in OA. Future work should focus on confirming validity across a range of disease areas.
Asunto(s)
Cumplimiento de la Medicación , Osteoartritis , Psicometría , Humanos , Femenino , Masculino , Psicometría/métodos , Osteoartritis/tratamiento farmacológico , Osteoartritis/psicología , Cumplimiento de la Medicación/psicología , Cumplimiento de la Medicación/estadística & datos numéricos , Persona de Mediana Edad , Encuestas y Cuestionarios , Anciano , Reproducibilidad de los Resultados , Adulto , Estados UnidosRESUMEN
BACKGROUND: Breast cancer is one of the most common cancers in women. Patient-reported outcome measures are used to evaluate patients' health-related quality of life in clinical breast cancer studies. This study evaluated the structure, validity, reliability, and responsiveness of the National Comprehensive Cancer Network-Functional Assessment of Cancer Therapy-Breast Cancer Symptom Index (NFBSI-16) subscales in a clinical trial featuring patients with advanced/metastatic breast cancer (aBC), and estimated NFBSI-16 meaningful change thresholds. METHODS: Data from 101 patients with aBC enrolled in a phase II trial (Xenera-1) were included for psychometric evaluation of the NFBSI-16. Subscale structure was evaluated by assessing inter-item correlations, item-total correlations, and internal consistency (cycles 2 and 5). Validity was assessed using scale-level convergent validity (cycles 2 and 5) and known-groups (Baseline). Reliability was analysed via test-retest at cycles 3-4, and responsiveness to improvement and worsening was evaluated at cycles 5, 7, and 9. Meaningful change thresholds were estimated using anchor-based methods (supported by distribution-based methods) at cycles 5, 7, and 9. RESULTS: NFBSI-16 internal consistency was acceptable, but item-total correlations suggested that its subscales and the GP5 item (side-effect of treatment) scores may be preferred over a total score. Convergent and known-groups evidence supported NFBSI-16 validity. Test-retest reliability was good to excellent for Total and DRS-P (disease-related symptoms: physical) scales, and moderate for the GP5 item. Responsiveness to worsening was generally demonstrated, but responsiveness to improvement could not be demonstrated due to limited observed improvement. Anchor-based meaningful change thresholds were estimated for DRS-P and Total scores. CONCLUSION: This study provides evidence that the NFBSI-16 has desirable psychometric properties for use in clinical studies in aBC. It also provides estimates of group- and individual-level meaningful change thresholds to facilitate score interpretation in future aBC research.
Asunto(s)
Neoplasias de la Mama , Medición de Resultados Informados por el Paciente , Psicometría , Calidad de Vida , Humanos , Femenino , Neoplasias de la Mama/psicología , Neoplasias de la Mama/terapia , Reproducibilidad de los Resultados , Persona de Mediana Edad , Psicometría/métodos , Adulto , Anciano , Encuestas y CuestionariosRESUMEN
Background: Assessment of reliability is one of the key components of the validation process designed to demonstrate that a novel clinical measure assessed by a digital health technology tool is fit-for-purpose in clinical research, care, and decision-making. Reliability assessment contributes to characterization of the signal-to-noise ratio and measurement error and is the first indicator of potential usefulness of the proposed clinical measure. Summary: Methodologies for reliability analyses are scattered across literature on validation of PROs, wet biomarkers, etc., yet are equally useful for digital clinical measures. We review a general modeling framework and statistical metrics typically used for reliability assessments as part of the clinical validation. We also present methods for the assessment of agreement and measurement error, alongside modified approaches for categorical measures. We illustrate the discussed techniques using physical activity data from a wearable device with an accelerometer sensor collected in clinical trial participants. Key Messages: This paper provides statisticians and data scientists, involved in development and validation of novel digital clinical measures, an overview of the statistical methodologies and analytical tools for reliability assessment.
RESUMEN
BACKGROUND AND OBJECTIVE: The chest-related electronic patient reported outcome (ePRO) diary was recently developed to assess chest-related symptoms experienced by pediatric and adolescent populations during upper respiratory tract infections (URTI). The objective of this research was the psychometric evaluation of the chest-related ePRO diary in pediatric, adolescent and adult participants. METHODS: This non-interventional, psychometric validation study involved participants (N = 195; n = 42 6-8 years; n = 47 9-11 years; n = 55 12-17 years, n = 51 18+ years) completing the chest-related ePRO diary twice daily for 10 days while experiencing an acute URTI. Preliminary item-level performance and dimensionality results, along with consideration of previous qualitative findings, were used to inform item reduction decisions, the structure of the measure and scoring algorithm development. Subsequent analyses on the finalized measure included assessments of reliability (internal consistency and test-retest reliability), construct validity (convergent validity and known groups validity) and ability to detect change. Comparisons of findings were made between the different age groups as part of the analyses to assess the psychometric properties of the chest-related ePRO diary and to characterize potential differences in the symptom experience of children, adolescents, and adults. RESULTS: The measure demonstrated strong quality of completion and showed relatively similar trajectories of symptom scores over time within different age subgroups and good item response distribution properties. Exploratory factor analysis supported a one-factor solution in the total population and within age subgroups, and test-retest reliability of the measure was strong (Intra-class correlation: 0.843-0.894 between Visit 1 and Day 1). The measure also demonstrated strong construct validity through high correlations with relevant items on the Child Cold Symptom Questionnaire (CCSQ), strong known groups validity (with statistically significant differences between severity groups) and was responsive to change over time with change groups defined based on change on global items. CONCLUSION: The findings demonstrate that the chest-related ePRO diary provides a valid, reliable, responsive measure of chest congestion symptoms experienced with the common cold in pediatric and adolescent populations, and that only minor differences are present in the disease trajectory when comparing adults to younger participants, supporting the use of the measure in interventional studies.
Asunto(s)
Electrónica , Medición de Resultados Informados por el Paciente , Adulto , Adolescente , Humanos , Niño , Psicometría/métodos , Reproducibilidad de los Resultados , Encuestas y CuestionariosRESUMEN
BACKGROUND: Although the psychometric properties of patient-reported outcome measures (e.g. the 22-item Sino-nasal Outcomes Test [SNOT-22]) in chronic rhinosinusitis with nasal polyps (CRSwNP) have been defined, these definitions have not been extensively studied in patients with very severe CRSwNP, as defined by recurrent disease despite ≥ 1 previous surgery and a current need for further surgery. Therefore, the psychometric properties of the symptoms visual analogue scales (VAS) were evaluated, and meaningful within-patient change thresholds were calculated for VAS and SNOT-22. METHODS: SYNAPSE (NCT03085797), a randomized, double-blind, placebo-controlled, 52-week trial, assessed the efficacy and safety of 4-weekly mepolizumab 100 mg subcutaneously added to standard of care in very severe CRSwNP. Enrolled patients (n = 407) completed symptom VAS (six items) daily and SNOT-22 every 4 weeks from baseline until Week 52. Blinded psychometric assessment of individual and composite VAS was performed post hoc, including anchor-based thresholds for meaningful within-patient changes for VAS and SNOT-22, supported by cumulative distribution function and probability density function plots. The effect of mepolizumab versus placebo for 52 weeks on VAS and SNOT-22 scores was then determined using these thresholds using unblinded data. RESULTS: Internal consistency was acceptable for VAS and SNOT-22 scores (Cronbach's α-coefficients ≥ 0.70). Test-retest reliability was demonstrated for all symptom VAS (Intra-Class Correlation coefficients > 0.75). Construct validity was acceptable between individual and composite VAS and SNOT-22 total score (r = 0.461-0.598) and between individual symptom VAS and corresponding SNOT-22 items (r = 0.560-0.780), based upon pre-specified ranges. Known-groups validity assessment demonstrated generally acceptable validity based on factors associated with respiratory health, with all VAS responsive to change. Mepolizumab treatment was associated with significantly increased odds of meeting or exceeding meaningful within-patient change thresholds, derived for this very severe cohort using six anchor groups for individual VAS (odds ratio [OR] 2.19-2.68) at Weeks 49-52, and SNOT-22 (OR 1.61-2.96) throughout the study. CONCLUSIONS: Symptoms VAS and SNOT-22 had acceptable psychometric properties for use in very severe CRSwNP. Mepolizumab provided meaningful within-patient improvements in symptom severity and health-related quality of life versus placebo, indicating mepolizumab provides substantial clinical benefits in very severe CRSwNP.
Patients with chronic rhinosinusitis (CRS) often have blocked or runny noses, and loss of sense of smell. They can also have sac-like growths in their nose called nasal polyps, which often require surgical removement. The symptoms of CRS with nasal polyps can affect quality of life. In a clinical study named SYNAPSE, a new treatment option called mepolizumab reduced the size and severity of nasal polyps in patients suffering from very severe CRS with nasal polyps, compared with placebo. Mepolizumab also reduced the need for nasal polyp surgery. The SYNAPSE study also measured if 1 year of mepolizumab treatment improved patients' symptoms and quality of life. This was evaluated by asking patients to complete two separate tasks. These tasks were rating symptoms on a visual analogue scale (VAS) and completing a quality of life questionnaire called SNOT-22. The objective of this analysis was to see if these questionnaires accurately assessed a patient's quality of life. The analysis also assessed how many patients had major improvements in their symptoms with mepolizumab. Overall, data from 407 patients in the SYNAPSE study was analyzed. Results showed that both the VAS and SNOT-22 questionnaires accurately captured CRS symptoms and quality of life. In addition, patients treated with mepolizumab for 1 year had improvements in quality of life compared with placebo. In conclusion, these findings suggest that the VAS and SNOT-22 questionnaires are appropriate evaluation tools for patients with very severe CRS with nasal polyps. The findings also show that mepolizumab treatment is beneficial for these patients.
Asunto(s)
Pólipos Nasales , Rinitis , Sinusitis , Humanos , Pólipos Nasales/complicaciones , Calidad de Vida , Psicometría , Reproducibilidad de los Resultados , Rinitis/complicaciones , Enfermedad Crónica , Sinusitis/complicacionesRESUMEN
INTRODUCTION: Transition ratings (TRs) are single item measures which ask patients to report on their health change. They allow for a simple assessment of improvement or deterioration and are frequently used as an "anchor" to determine interpretation thresholds on a patient-reported outcome measure (PROM). Despite their widespread use, a routinely applicable method to assess their reliability is lacking. This paper introduces a method to estimate the reliability of TRs based on confirmatory factor analysis (CFA) for categorical data. METHOD: We modelled longitudinal PROM data as independent factors representing Time 1 and Time 2 in a CFA model. PROM items taken at Time 1 (T1) loaded on the first factor, although the same items taken at Time 2 (T2) loaded on the second. The TR item loaded onto both T1 and T2 factors. Three models with various constraints on the loadings and thresholds were examined. The communality (R2) statistic was used as a measure of the TR reliability. The approach was evaluated using simulated data and exemplified in four empirical datasets. RESULTS: The simplest CFA model without constraints on the item loadings and thresholds performed equivalently to models with constraints on loadings and thresholds over time. Further constraints on the TR item loadings to be equal and opposite over time caused biased TR reliability estimates if the T1 and T2 loadings differed in magnitude. In the four empirical datasets, reliability of TRs ranged from 0.27 to 0.48. In three examples the TR had numerically stronger loading on T2 than on T1. DISCUSSION AND CONCLUSIONS: Results support the use of the proposed method in understanding the reliability of TRs. Empirical study results reflect the typical range of reliability that has previously been reported for single items. Methodological considerations to improve TR reliability are presented, and developments of this method, are posited.