RESUMO
Over half a century ago, the terms "pragmatic" and "explanatory" were introduced to biomedicine by Schwartz and Lellouch, presenting two distinct conceptual approaches to trial design. Today, we frequently say that there are pragmatic trials and there are explanatory trials. Pragmatic trials inform decision-making in practice, explanatory trials aim to understand the mechanism of an intervention. They are often perceived as diametral extremes of a continuum. In this commentary, we argue that with the digitalization of health care and clinical research ways for modern trial designs were were paved and new avenues opened, and that there is no such continuum. Since the groundbreaking work of Schwartz and Lellouch, new approaches and methods have become available that allow researchers to address pragmatic and explanatory questions in parallel in the same trial. Emerging availability of routinely collected "real-world" data, development of decentralized trial techniques, and creation of digital biomarkers allow to observe health outcomes with minimal or no interference in real-world care. This overcomes previous limitations to study mechanisms of interventions in routine care and makes the idea of a continuum obsolete. We argue that pragmatism and explanatorism need to be understood as two distinct but compatible conceptual dimensions to open new perspectives for using novel technologies to design the most informative clinical trials and making better clinical and regulatory decisions. We base our argument on an analysis of the concept of a continuum and highlight its limitations. We review key trial design features and introduce a new concept that sees explanatory design features as fundamental, invasive or non-invasive, sufficient or insufficient. We describe their impact on pragmatism and explanatorism and show how multidimensional pragmatic-explanatory trials that are most useful are possible today.
RESUMO
OBJECTIVE: To investigate the longitudinal dynamics of serum glial fibrillary acidic protein (sGFAP) and serum neurofilament light chain (sNfL) levels in people with multiple sclerosis (pwMS) under B-cell depleting therapy (BCDT) and their capacity to prognosticate future progression independent of relapse activity (PIRA) events. METHODS: A total of 362 pwMS (1,480 samples) starting BCDT in the Swiss Multiple Sclerosis (MS) Cohort were included. sGFAP levels in 2,861 control persons (4,943 samples) provided normative data to calculate adjusted Z scores. RESULTS: Elevated sGFAP levels (Z score >1) at 1 year were associated with a higher hazard for PIRA (hazard ratio [HR]: 1.80 [95% CI: 1.17-2.78]; p = 0.0079) than elevated sNfL levels (HR, 1.45 [0.95-2.24], p = 0.0886) in a combined model. Independent of PIRA events, sGFAP levels longitudinally increased by 0.49 Z score units per 10 years follow-up (estimate, 0.49 [0.29, 0.69], p < 0.0001). In patients experiencing PIRA, sGFAP Z scores were 0.52 Z score units higher versus stable patients (0.52 [0.22, 0.83], p = 0.0009). Different sNfL Z score trajectories were found in pwMS with versus without PIRA (interaction p = 0.0028), with an average decrease of 0.92 Z score units per 10 years observed without PIRA (-0.92 [-1.23, -0.60], p < 0.0001), whereas levels in patients with PIRA remained high. INTERPRETATION: Elevated sGFAP and lack of drop in sNfL after BCDT start are associated with increased risk of future PIRA. These findings provide a rationale for combined monitoring of sNfL and sGFAP in pwMS starting BCDT to predict the risk of PIRA, and to use sGFAP as an outcome in clinical trials aiming to impact on MS progressive disease biology. ANN NEUROL 2024.
RESUMO
RATIONALE, AIMS, AND OBJECTIVES: The previous studies demonstrated that the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system, a leading method for evaluating the certainty (quality) of scientific evidence (CoE), cannot reliably differentiate between various levels of CoE when the objective is to accurately assess the magnitude of the treatment effect. An estimated effect size is a function of multiple factors, including the true underlying treatment effect, biases, and other nonlinear factors that affect the estimate in different directions. We postulate that non-weighted, simple linear tallying can provide more accurate estimates of the probability of a true estimate of treatment effects as a function of CoE. METHODS: We reasoned that stable treatment effect estimates over time indicate truthfulness. We compared odds ratios (ORs) from meta-analyses (MAs) before and after updates, hypothesising that a ratio of odds ratios (ROR) equal to 1 will be more commonly observed in higher versus lower CoE. We used a subset of a previously analysed data set consisting of 82 Cochrane pairs of MAs in which CoE has not changed with the updated MA. If the linear model is valid, we would expect a decrease in the number of ROR = 1 cases as we move from high to moderate, low, and very low CoE. RESULTS: We found a linear relationship between the probability of a potentially 'true' estimate of treatment effects as a function of CoE (assuming a 10% ROR error margin) (R2 = 1; p = 0.001). The probability of potentially 'true' estimates decreases by 21% (95% CI: 18%-24%) for each drop in the rating of CoE. A linear relationship with a 5% ROR error margin was less clear, likely due to a smaller sample size. Still, higher CoE showed a significantly greater probability of 'true' effects (53%) compared to non-high (i.e., moderate, low, or very low) CoE (25%); p = 0.032. CONCLUSION: This study confirmed linear relationship between CoE and the probability of potentially 'true' estimates. We found that the probability of potentially "true" estimates decreases by about 20% for each drop in CoE (from about 80% for high to 55% for moderate to 35% to low and 15% to very low CoE).
RESUMO
BACKGROUND AND OBJECTIVE: It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in appraisal of scientific reporting (Preferred Reporting Items for Systematic reviews and Meta-Analyses [PRISMA]) and methodological rigor (A MeaSurement Tool to Assess systematic Reviews [AMSTAR]) of systematic reviews and design of clinical trials (PRagmatic Explanatory Continuum Indicator Summary 2 [PRECIS-2]) and to identify areas where collaboration between humans and artificial intelligence (AI) would outperform the traditional consensus process of human raters in efficiency. STUDY DESIGN AND SETTING: Five LLMs (Claude-3-Opus, Claude-2, GPT-4, GPT-3.5, Mixtral-8x22B) assessed 112 systematic reviews applying the PRISMA and AMSTAR criteria and 56 randomized controlled trials applying PRECIS-2. We quantified the agreement between human consensus and (1) individual human raters; (2) individual LLMs; (3) combined LLMs approach; (4) human-AI collaboration. Ratings were marked as deferred (undecided) in case of inconsistency between combined LLMs or between the human rater and the LLM. RESULTS: Individual human rater accuracy was 89% for PRISMA and AMSTAR, and 75% for PRECIS-2. Individual LLM accuracy was ranging from 63% (GPT-3.5) to 70% (Claude-3-Opus) for PRISMA, 53% (GPT-3.5) to 74% (Claude-3-Opus) for AMSTAR, and 38% (GPT-4) to 55% (GPT-3.5) for PRECIS-2. Combined LLM ratings led to accuracies of 75%-88% for PRISMA (4%-74% deferred), 74%-89% for AMSTAR (6%-84% deferred), and 64%-79% for PRECIS-2 (29%-88% deferred). Human-AI collaboration resulted in the best accuracies from 89% to 96% for PRISMA (25/35% deferred), 91%-95% for AMSTAR (27/30% deferred), and 80%-86% for PRECIS-2 (76/71% deferred). CONCLUSION: Current LLMs alone appraised evidence worse than humans. Human-AI collaboration may reduce workload for the second human rater for the assessment of reporting (PRISMA) and methodological rigor (AMSTAR) but not for complex tasks such as PRECIS-2.
RESUMO
BACKGROUND: Treatment decisions for persons with relapsing-remitting multiple sclerosis (RRMS) rely on clinical and radiological disease activity, the benefit-harm profile of drug therapy, and preferences of patients and physicians. However, there is limited evidence to support evidence-based personalized decision-making on how to adapt disease-modifying therapy treatments targeting no evidence of disease activity, while achieving better patient-relevant outcomes, fewer adverse events, and improved care. Serum neurofilament light chain (sNfL) is a sensitive measure of disease activity that captures and prognosticates disease worsening in RRMS. sNfL might therefore be instrumental for a patient-tailored treatment adaptation. We aim to assess whether 6-monthly sNfL monitoring in addition to usual care improves patient-relevant outcomes compared to usual care alone. METHODS: Pragmatic multicenter, 1:1 randomized, platform trial embedded in the Swiss Multiple Sclerosis Cohort (SMSC). All patients with RRMS in the SMSC for ≥ 1 year are eligible. We plan to include 915 patients with RRMS, randomly allocated to two groups with different care strategies, one of them new (group A) and one of them usual care (group B). In group A, 6-monthly monitoring of sNfL will together with information on relapses, disability, and magnetic resonance imaging (MRI) inform personalized treatment decisions (e.g., escalation or de-escalation) supported by pre-specified algorithms. In group B, patients will receive usual care with their usual 6- or 12-monthly visits. Two primary outcomes will be used: (1) evidence of disease activity (EDA3: occurrence of relapses, disability worsening, or MRI activity) and (2) quality of life (MQoL-54) using 24-month follow-up. The new treatment strategy with sNfL will be considered superior to usual care if either more patients have no EDA3, or their health-related quality of life increases. Data collection will be embedded within the SMSC using established trial-level quality procedures. DISCUSSION: MultiSCRIPT aims to be a platform where research and care are optimally combined to generate evidence to inform personalized decision-making in usual care. This approach aims to foster better personalized treatment and care strategies, at low cost and with rapid translation to clinical practice. TRIAL REGISTRATION: ClinicalTrials.gov NCT06095271. Registered on October 23, 2023.
Assuntos
Biomarcadores , Esclerose Múltipla Recidivante-Remitente , Proteínas de Neurofilamentos , Ensaios Clínicos Pragmáticos como Assunto , Medicina de Precisão , Humanos , Proteínas de Neurofilamentos/sangue , Esclerose Múltipla Recidivante-Remitente/tratamento farmacológico , Esclerose Múltipla Recidivante-Remitente/diagnóstico , Biomarcadores/sangue , Medicina de Precisão/métodos , Suíça , Ensaios Clínicos Controlados Aleatórios como Assunto , Tomada de Decisão Clínica , Estudos Multicêntricos como Assunto , Resultado do Tratamento , Progressão da Doença , Fatores de Tempo , Valor Preditivo dos Testes , Avaliação da Deficiência , Qualidade de VidaRESUMO
OBJECTIVES: Trials within Cohorts (TwiCs) is a pragmatic design approach that may overcome frequent challenges of traditional randomized trials such as slow recruitment, burdensome consent procedures, or limited external validity. This scoping review aims to identify all randomized controlled trials using the TwiCs design and to summarize their design characteristics, ways to obtain informed consent, output, reported challenges and mitigation strategies. STUDY DESIGN AND SETTING: Systematic search of Medline, Embase, Cochrane, trial registries and citation tracking up to December 2022. TwiCs were defined as randomized trials embedded in a cohort with postrandomization consent for the intervention group and no specific postrandomization consent for the usual care control group. Information from identified TwiCs was extracted in duplicate from protocols, publications, and registry entries. We analyzed the information descriptively and qualitatively to highlight methodological challenges and solutions related to nonuptake of interventions and informed consent procedure. RESULTS: We identified a total of 46 TwiCs conducted between 2005 and 2022 in 14 different countries by a handful of research groups. The most common medical fields were oncology (11/46; 24%), infectious diseases (8/46; 17%), and mental health (7/46; 15%). A typical TwiCs was investigator-initiated (46/46; 100%), publicly funded (36/46; 78%), and recruited outpatients (27/46; 59%). Excluding eight pilot trials, only 16/38 (42%) TwiCs adjusted their calculated sample size for nonuptake of the intervention, anticipating a median nonuptake of 25% (interquartile range 10%-32%) in the experimental arm. Seventeen TwiCs (45%) planned analyses to adjust effect estimates for nonuptake. Regarding informed consent, we observed three patterns: 1) three separate consents for cohort participation, randomization, and intervention (17/46; 37%); 2) combined consent for cohort participation and randomization and a separate intervention consent (10/46; 22%); and 3) consent only for cohort participation and intervention (randomization consent not mentioned; 19/46; 41%). CONCLUSION: Existing TwiCs are globally scattered across a few research groups covering a wide range of medical fields and interventions. Despite the potential advantages, the number of TwiCs remains small. The variability in consent procedures and the possibility of substantial nonuptake of the intervention warrants further research to guide the planning, implementation, and analysis of TwiCs.
Assuntos
Consentimento Livre e Esclarecido , Ensaios Clínicos Controlados Aleatórios como Assunto , Projetos de Pesquisa , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/métodos , Consentimento Livre e Esclarecido/estatística & dados numéricos , Estudos de CoortesRESUMO
OBJECTIVES: To quantify the strength of statistical evidence of randomized controlled trials (RCTs) for novel cancer drugs approved by the Food and Drug Administration in the last 2 decades. STUDY DESIGN AND SETTING: We used data on overall survival (OS), progression-free survival, and tumor response for novel cancer drugs approved for the first time by the Food and Drug Administration between January 2000 and December 2020. We assessed strength of statistical evidence by calculating Bayes factors (BFs) for all available endpoints, and we pooled evidence using Bayesian fixed-effect meta-analysis for indications approved based on 2 RCTs. Strength of statistical evidence was compared among endpoints, approval pathways, lines of treatment, and types of cancer. RESULTS: We analysed the available data from 82 RCTs corresponding to 68 indications supported by a single RCT and 7 indications supported by 2 RCTs. Median strength of statistical evidence was ambiguous for OS (BF = 1.9; interquartile range [IQR] 0.5-14.5), and strong for progression-free survival (BF = 24,767.8; IQR 109.0-7.3 × 106) and tumor response (BF = 113.9; IQR 3.0-547,100). Overall, 44 indications (58.7%) were approved without clear statistical evidence for OS improvements and 7 indications (9.3%) were approved without statistical evidence for improvements on any endpoint. Strength of statistical evidence was lower for accelerated approval compared to nonaccelerated approval across all 3 endpoints. No meaningful differences were observed for line of treatment and cancer type. This analysis is limited to statistical evidence. We did not consider nonstatistical factors (eg, risk of bias, quality of the evidence). CONCLUSION: BFs offer novel insights into the strength of statistical evidence underlying cancer drug approvals. Most novel cancer drugs lack strong statistical evidence that they improve OS, and a few lack statistical evidence for efficacy altogether. These cases require a transparent and clear explanation. When evidence is ambiguous, additional postmarketing trials could reduce uncertainty.
Assuntos
Antineoplásicos , Teorema de Bayes , Aprovação de Drogas , Neoplasias , Ensaios Clínicos Controlados Aleatórios como Assunto , United States Food and Drug Administration , Humanos , Antineoplásicos/uso terapêutico , Neoplasias/tratamento farmacológico , Neoplasias/mortalidade , Estados Unidos , Intervalo Livre de Progressão , Resultado do TratamentoRESUMO
OBJECTIVE: To evaluate the personal protective effects of wearing versus not wearing surgical face masks in public spaces on self-reported respiratory symptoms over a 14 day period. DESIGN: Pragmatic randomised superiority trial. SETTING: Norway. PARTICIPANTS: 4647 adults aged ≥18 years: 2371 were assigned to the intervention arm and 2276 to the control arm. INTERVENTIONS: Participants in the intervention arm were assigned to wear a surgical face mask in public spaces (eg, shopping centres, streets, public transport) over a 14 day period (mask wearing at home or work was not mentioned). Participants in the control arm were assigned to not wear a surgical face mask in public places. MAIN OUTCOME MEASURES: The primary outcome was self-reported respiratory symptoms consistent with a respiratory infection. Secondary outcomes included self-reported and registered covid-19 infection. RESULTS: Between 10 February 2023 and 27 April 2023, 4647 participants were randomised of whom 4575 (2788 women (60.9%); mean age 51.0 (standard deviation 15.0) years) were included in the intention-to-treat analysis: 2313 (50.6%) in the intervention arm and 2262 (49.4%) in the control arm. 163 events (8.9%) of self-reported symptoms consistent with respiratory infection were reported in the intervention arm and 239 (12.2%) in the control arm. The marginal odds ratio was 0.71 (95% confidence interval (CI) 0.58 to 0.87; P=0.001) favouring the face mask intervention. The absolute risk difference was -3.2% (95% CI -5.2% to -1.3%; P<0.001). No statistically significant effect was found on self- reported (marginal odds ratio 1.07, 95% CI 0.58 to 1.98; P=0.82) or registered covid-19 infection (effect estimate and 95% CI not estimable owing to lack of events in the intervention arm). CONCLUSION: Wearing a surgical face mask in public spaces over 14 days reduces the risk of self-reported symptoms consistent with a respiratory infection, compared with not wearing a surgical face mask. TRIAL REGISTRATION: ClinicalTrials.gov NCT05690516.
Assuntos
COVID-19 , Máscaras , SARS-CoV-2 , Autorrelato , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Betacoronavirus , Infecções por Coronavirus/prevenção & controle , Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/transmissão , COVID-19/prevenção & controle , COVID-19/epidemiologia , Noruega/epidemiologia , Pandemias/prevenção & controle , Pneumonia Viral/prevenção & controle , Pneumonia Viral/epidemiologia , Pneumonia Viral/transmissão , Infecções Respiratórias/prevenção & controleAssuntos
COVID-19 , Máscaras , Humanos , COVID-19/prevenção & controle , COVID-19/transmissão , SARS-CoV-2RESUMO
Consensus statements can be very influential in medicine and public health. Some of these statements use systematic evidence synthesis but others fail on this front. Many consensus statements use panels of experts to deduce perceived consensus through Delphi processes. We argue that stacking of panel members toward one particular position or narrative is a major threat, especially in absence of systematic evidence review. Stacking may involve financial conflicts of interest, but nonfinancial conflicts of strong advocacy can also cause major bias. Given their emerging importance, we describe here how such consensus statements may be misleading, by analyzing in depth a recent high-impact Delphi consensus statement on COVID-19 recommendations as a case example. We demonstrate that many of the selected panel members and at least 35% of the core panel members had advocated toward COVID-19 elimination (Zero-COVID) during the pandemic and were leading members of aggressive advocacy groups. These advocacy conflicts were not declared in the Delphi consensus publication, with rare exceptions. Therefore, we propose that consensus statements should always require rigorous evidence synthesis and maximal transparency on potential biases toward advocacy or lobbyist groups to be valid. While advocacy can have many important functions, its biased impact on consensus panels should be carefully avoided.
Assuntos
COVID-19 , Consenso , Técnica Delphi , Humanos , COVID-19/prevenção & controle , SARS-CoV-2 , Conflito de Interesses , Reprodutibilidade dos Testes , PandemiasRESUMO
RATIONALE: Novel therapeutic approaches are needed in stroke recovery. Whether pharmacological therapies are beneficial for enhancing stroke recovery is unclear. Dopamine is a neurotransmitter involved in motor learning, reward, and brain plasticity. Its prodrug levodopa is a promising agent for stroke recovery. AIM AND HYPOTHESIS: To investigate the hypothesis that levodopa, in addition to standardized rehabilitation therapy based on active task training, results in an enhancement of functional recovery in acute ischemic or hemorrhagic stroke patients compared to placebo. DESIGN: ESTREL (Enhancement of Stroke REhabilitation with Levodopa) is a randomized (ratio 1:1), multicenter, placebo-controlled, double-blind, parallel-group superiority trial. PARTICIPANTS: 610 participants (according to sample size calculation) with a clinically meaningful hemiparesis will be enrolled ⩽7 days after stroke onset. Key eligibility criteria include (i) in-hospital-rehabilitation required, (ii) capability to participate in rehabilitation, (iii) previous independence in daily living. INTERVENTION: Levodopa 100 mg/carbidopa 25 mg three times daily, administered for 5 weeks in addition to standardized rehabilitation. The study intervention will be initiated within 7 days after stroke onset. COMPARISON: Matching placebo plus standardized rehabilitation. OUTCOMES: The primary outcome is the between-group difference of the Fugl-Meyer-Motor Assessment (FMMA) total score measured 3 months after randomization. Secondary outcomes include patient-reported health and wellbeing (PROMIS 10 and 29), patient-reported assessment of improvement, Rivermead Mobility Index, modified Rankin Scale, National Institutes of Health Stroke Scale (NIHSS), and as measures of harm: mortality, recurrent stroke, and serious adverse events. CONCLUSION: The ESTREL trial will provide evidence of whether the use of Levodopa in addition to standardized rehabilitation in stroke patients leads to better functional recovery compared to rehabilitation alone.
RESUMO
OBJECTIVES: To assess to what extent the overall quality of evidence indicates changes to observe intervention effect estimates when new data become available. METHODS: We conducted a meta-epidemiological study. We obtained evidence from meta-analyses of randomized trials of Cochrane reviews addressing the same health-care question that was updated with inclusion of additional data between January 2016 and May 2021. We extracted the reported effect estimates with 95% confidence intervals (CIs) from meta-analyses and corresponding GRADE (Grading of Recommendations Assessment, Development, and Evaluation) assessments of any intervention comparison for the primary outcome in the first and the last updated review version. We considered the reported overall quality (certainty) of evidence (CoE) and specific evidence limitations (no, serious or very serious for risk of bias, imprecision, inconsistency, and/or indirectness). We assessed the change in pooled effect estimates between the original and updated evidence using the ratio of odds ratio (ROR), absolute ratio of odds ratio (aROR), ratio of standard errors (RoSE), direction of effects, and level of statistical significance. RESULTS: High CoE without limitations characterized 19.3% (n = 29) out of 150 included original Cochrane reviews. The update with additional data did not systematically change the effect estimates (mean ROR 1.00; 95% CI 0.99-1.02), which deviated 1.06-fold from the older estimates (median aROR; interquartile range [IQR]: 1.01-1.15), gained precision (median RoSE 0.87; IQR 0.76-1.00), and maintained the same direction with the same level of statistical significance in 93% (27 of 29) of cases. Lower CoE with limitations characterized 121 original reviews and graded as moderate CoE in 30.0% (45 of 150), low CoE in 32.0% (48 of 150), and very low CoE in 18.7% (28 of 150) reviews. Their update had larger absolute deviations (median aROR 1.12 to 1.33) and larger gains in precision (median RoSE 0.78-0.86) without clear and consistent differences between these categories of CoE. Changes in effect direction or statistical significance were also more common in the lower quality evidence, again with a similar extent across categories (without change in 75.6%, 64.6%, and 75.0% for moderate, low, very low CoE). As limitations increased, effect estimates deviated more (aROR 1.05 with zero, 1.11 with one, 1.25 with two, 1.24 with three limitations) and changes in direction or significance became more frequent (93.2% stable with no limitations, 74.5% with one, 68.2% with two, and 61.5% with three limitations). CONCLUSION: High-quality evidence without methodological deficiencies is trustworthy and stable, providing reliable intervention effect estimates when updated with new data. Evidence of moderate and lower quality may be equally prone to being unstable and cannot indicate if available effect estimates are true, exaggerated, or underestimated.
Assuntos
Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Ensaios Clínicos Controlados Aleatórios como Assunto/normas , Medicina Baseada em Evidências/normas , Medicina Baseada em Evidências/métodos , Metanálise como Assunto , Revisões Sistemáticas como Assunto/métodosRESUMO
BACKGROUND: Technological devices such as smartphones, wearables and virtual assistants enable health data collection, serving as digital alternatives to conventional biomarkers. We aimed to provide a systematic overview of emerging literature on 'digital biomarkers,' covering definitions, features and citations in biomedical research. METHODS: We analysed all articles in PubMed that used 'digital biomarker(s)' in title or abstract, considering any study involving humans and any review, editorial, perspective or opinion-based articles up to 8 March 2023. We systematically extracted characteristics of publications and research studies, and any definitions and features of 'digital biomarkers' mentioned. We described the most influential literature on digital biomarkers and their definitions using thematic categorisations of definitions considering the Food and Drug Administration Biomarkers, EndpointS and other Tools framework (ie, data type, data collection method, purpose of biomarker), analysing structural similarity of definitions by performing text and citation analyses. RESULTS: We identified 415 articles using 'digital biomarker' between 2014 and 2023 (median 2021). The majority (283 articles; 68%) were primary research. Notably, 287 articles (69%) did not provide a definition of digital biomarkers. Among the 128 articles with definitions, there were 127 different ones. Of these, 78 considered data collection, 56 data type, 50 purpose and 23 included all three components. Those 128 articles with a definition had a median of 6 citations, with the top 10 each presenting distinct definitions. CONCLUSIONS: The definitions of digital biomarkers vary significantly, indicating a lack of consensus in this emerging field. Our overview highlights key defining characteristics, which could guide the development of a more harmonised accepted definition.
Assuntos
Biomarcadores , Humanos , Biomarcadores/análise , Pesquisa BiomédicaRESUMO
BACKGROUND: Increasingly, patients, clinicians, and regulators call for more evidence on the impact of innovative medicines on quality of life (QoL). We assessed the effects of disease-modifying therapies (DMTs) on QoL in people with multiple sclerosis (PwMS). METHODS: Randomized trials assessing approved DMTs in PwMS with results for at least one outcome referred to as "quality of life" were searched in PubMed and ClinicalTrials.gov. RESULTS: We identified 38 trials published between 1999 and 2023 with a median of 531 participants (interquartile range (IQR) 202 to 941; total 23,225). The evaluated DMTs were mostly interferon-beta (n = 10; 26%), fingolimod (n = 7; 18%), natalizumab (n = 5; 13%), and glatiramer acetate (n = 4; 11%). The 38 trials used 18 different QoL instruments, with up to 11 QoL subscale measures per trial (median 2; IQR 1-3). QoL was never the single primary outcome. We identified quantitative QoL results in 24 trials (63%), and narrative statements in 15 trials (39%). In 16 trials (42%), at least one of the multiple QoL results was statistically significant. The effect sizes of the significant quantitative QoL results were large (median Cohen's d 1.02; IQR 0.3-1.7; median Hedges' g 1.01; IQR 0.3-1.69) and ranged between d 0.14 and 2.91. CONCLUSIONS: Certain DMTs have the potential to positively impact QoL of PwMS, and the assessment and reporting of QoL is suboptimal with a multitude of diverse instruments being used. There is an urgent need that design and reporting of clinical trials reflect the critical importance of QoL for PwMS.
Assuntos
Esclerose Múltipla , Qualidade de Vida , Humanos , Esclerose Múltipla/tratamento farmacológico , Esclerose Múltipla/psicologia , Ensaios Clínicos Controlados Aleatórios como Assunto , Avaliação de Resultados em Cuidados de Saúde , Fatores Imunológicos/uso terapêuticoRESUMO
Background: Smoking cessation is challenging, despite making use of established smoking cessation therapies. Preclinical studies and one clinical pilot study suggest the antidiabetic drug glucagon-like peptide-1 (GLP-1) analogue to modulate addictive behaviours and nicotine craving. Previously, we reported the short-term results of a randomised, double-blind, placebo-controlled trial. Herein we report long-term abstinence rates and weight developments after 24 and 52 weeks. Methods: This single-centre, randomised, double-blind, placebo-controlled, parallel group trial was done at the University Hospital Basel in Switzerland. We randomly assigned (1:1) individuals with at least a moderate nicotine dependence willing to quit smoking to either a 12-week treatment with dulaglutide 1.5 mg or placebo subcutaneously once weekly in addition to standard of care smoking cessation therapy (varenicline 2 mg/day and behavioural counselling). After 12 weeks, dulaglutide or placebo injections were discontinued and the participants were followed up at week 24 and 52. The primary outcome of self-reported and biochemically confirmed point prevalence abstinence rate, and secondary outcome of secondary outcome of weight change were assessed at weeks 24 and 52. All participants who received one dose of the study drug were included in the intention to treat set and participants who received at least 10/12 doses of the study drug formed the per protocol set. The trial was registered at ClinicalTrials.gov, NCT03204396. Findings: Of the 255 participants who were randomly assigned between June 22, 2017 and December 3, 2020, 63% (80/127) (dulaglutide group) and 65% (83/128) (placebo group) were abstinent after 12 weeks. These abstinence rates declined to 43% (54/127) and 41% (52/128), respectively, after 24 weeks and to 32% (41/127) and 32% (41/128), respectively, after 52 weeks. Post-cessation weight gain was prevented in the dulaglutide group (-1.0 kg, standard deviation [SD] 2.7) as opposed to the placebo group (+1.9 kg, SD 2.4) after 12 weeks. However, at week 24, increases in weight from baseline were observed in both groups (median, interquartile range [IQR]: dulaglutide: +1.5 kg, [-0.4, 4.1], placebo: +3.0 kg, [0.6, 4.6], baseline-adjusted difference in weight change -1.0 kg (97.5% CI [-2.16, 0.16])), and at week 52 the groups showed similar weight gain (median, IQR: dulaglutide: +2.8 kg [-0.4, 4.7], placebo: +3.1 kg [-0.4, 6.0], baseline-adjusted difference in weight change: -0.35 kg (95% CI [-1.72, 1.01])). In the follow-up period (week 12 to week 52) 51 (51%) and 48 (48%) treatment-unrelated adverse events were recorded in the dulaglutide and the placebo group, respectively. No treatment-related serious adverse events or deaths occurred. Interpretation: Dulaglutide does not improve long-term smoking abstinence, but has potential to counteract weight gain after quitting. However, 3 months of treatment did not have a sustained beneficial effect on weight at 1 year. As post-cessation weight gain is highest in the first year after quitting smoking, future studies should consider a longer treatment duration with a GLP-1 analogue in abstinent individuals. Funding: Swiss National Science Foundation, the Gottfried and Julia Bangerter-Rhyner Foundation, the Goldschmidt-Jacobson Foundation, the Hemmi-Foundation, the University of Basel, the Swiss Academy of Medical Sciences.
RESUMO
OBJECTIVES: Evidence-based research (EBR) is the systematic and transparent use of prior research to inform a new study so that it answers questions that matter in a valid, efficient, and accessible manner. This study surveyed experts about existing (e.g., citation analysis) and new methods for monitoring EBR and collected ideas about implementing these methods. STUDY DESIGN AND SETTING: We conducted a cross-sectional study via an online survey between November 2022 and March 2023. Participants were experts from the fields of evidence synthesis and research methodology in health research. Open-ended questions were coded by recurring themes; descriptive statistics were used for quantitative questions. RESULTS: Twenty-eight expert participants suggested that citation analysis should be supplemented with content evaluation (not just what is cited but also in which context), content expert involvement, and assessment of the quality of cited systematic reviews. They also suggested that citation analysis could be facilitated with automation tools. They emphasized that EBR monitoring should be conducted by ethics committees and funding bodies before the research starts. Challenges identified for EBR implementation monitoring were resource constraints and clarity on responsibility for EBR monitoring. CONCLUSION: Ideas proposed in this study for monitoring the implementation of EBR can be used to refine methods and define responsibility but should be further explored in terms of feasibility and acceptability. Different methods may be needed to determine if the use of EBR is improving over time.
Assuntos
Projetos de Pesquisa , Humanos , Estudos TransversaisRESUMO
BACKGROUND: Pragmatic trials are increasingly recognized for providing real-world evidence on treatment choices. OBJECTIVE: The objective of this study is to investigate the use and characteristics of pragmatic trials in multiple sclerosis (MS). METHODS: Systematic literature search and analysis of pragmatic trials on any intervention published up to 2022. The assessment of pragmatism with PRECIS-2 (PRagmatic Explanatory Continuum Indicator Summary-2) is performed. RESULTS: We identified 48 pragmatic trials published 1967-2022 that included a median of 82 participants (interquartile range (IQR) = 42-160) to assess typically supportive care interventions (n = 41; 85%). Only seven trials assessed drugs (15%). Only three trials (6%) included >500 participants. Trials were mostly from the United Kingdom (n = 18; 38%), Italy (n = 6; 13%), the United States and Denmark (each n = 5; 10%). Primary outcomes were diverse, for example, quality-of-life, physical functioning, or disease activity. Only 1 trial (2%) used routinely collected data for outcome ascertainment. No trial was very pragmatic in all design aspects, but 14 trials (29%) were widely pragmatic (i.e. PRECIS-2 score ⩾ 4/5 in all domains). CONCLUSION: Only few and mostly small pragmatic trials exist in MS which rarely assess drugs. Despite the widely available routine data infrastructures, very few trials utilize them. There is an urgent need to leverage the potential of this pioneering study design to provide useful randomized real-world evidence.
Assuntos
Esclerose Múltipla , Ensaios Clínicos Pragmáticos como Assunto , Humanos , Esclerose Múltipla/tratamento farmacológico , Esclerose Múltipla/terapia , Ensaios Clínicos Controlados Aleatórios como AssuntoRESUMO
BACKGROUND: Clinical trial registries allow assessment of deviations of published trials from their protocol, which may indicate a considerable risk of bias. However, since entries in many registries can be updated at any time, deviations may go unnoticed. We aimed to assess the frequency of changes to primary outcomes in different historical versions of registry entries, and how often they would go unnoticed if only deviations between published trial reports and the most recent registry entry are assessed. METHODS AND FINDINGS: We analyzed the complete history of changes of registry entries in all 1746 randomized controlled trials completed at German university medical centers between 2009 and 2017, with published results up to 2022, that were registered in ClinicalTrials.gov or the German WHO primary registry (German Clinical Trials Register; DRKS). Data were retrieved on 24 January 2022. We assessed deviations between registry entries and publications in a random subsample of 292 trials. We determined changes of primary outcomes (1) between different versions of registry entries at key trial milestones, (2) between the latest registry entry version and the results publication, and (3) changes that occurred after trial start with no change between latest registry entry version and publication (so that assessing the full history of changes is required for detection of changes). We categorized changes as major if primary outcomes were added, dropped, changed to secondary outcomes, or secondary outcomes were turned into primary outcomes. We also assessed (4) the proportion of publications transparently reporting changes and (5) characteristics associated with changes. Of all 1746 trials, 23% (n = 393) had a primary outcome change between trial start and latest registry entry version, with 8% (n = 142) being major changes, that is, primary outcomes were added, dropped, changed to secondary outcomes, or secondary outcomes were turned into primary outcomes. Primary outcomes in publications were different from the latest registry entry version in 41% of trials (120 of the 292 sampled trials; 95% confidence interval (CI) [35%, 47%]), with major changes in 18% (54 of 292; 95% CI [14%, 23%]). Overall, 55% of trials (161 of 292; 95% CI [49%, 61%]) had primary outcome changes at any timepoint over the course of a trial, with 23% of trials (67 of 292; 95% CI [18%, 28%]) having major changes. Changes only within registry records, with no apparent discrepancy between latest registry entry version and publication, were observed in 14% of trials (41 of 292; 95% CI [10%, 19%]), with 4% (13 of 292; 95% CI [2%, 7%]) being major changes. One percent of trials with a change reported this in their publication (2 of 161 trials; 95% CI [0%, 4%]). An exploratory logistic regression analysis indicated that trials were less likely to have a discrepant registry entry if they were registered more recently (odds ratio (OR) 0.74; 95% CI [0.69, 0.80]; p<0.001), were not registered on ClinicalTrials.gov (OR 0.41; 95% CI [0.23, 0.70]; p = 0.002), or were not industry-sponsored (OR 0.29; 95% CI [0.21, 0.41]; p<0.001). Key limitations include some degree of subjectivity in the categorization of outcome changes and inclusion of a single geographic region. CONCLUSIONS: In this study, we observed that changes to primary outcomes occur in 55% of trials, with 23% trials having major changes. They are rarely transparently reported in the results publication and often not visible in the latest registry entry version. More transparency is needed, supported by deeper analysis of registry entries to make these changes more easily recognizable. Protocol registration: Open Science Framework (https://osf.io/t3qva; amendment in https://osf.io/qtd2b).
Assuntos
Universidades , Humanos , Viés , Sistema de Registros , Razão de ChancesRESUMO
BACKGROUND: Pragmatic trials provide decision-oriented, real-world evidence that is highly applicable and generalizable. The interest in real-world evidence is fueled by the assumption that effects in the "real-world" are different to effects obtained under artificial, controlled, research conditions as often used for traditional explanatory trials. However, it is unknown which features of pragmatism, generalizability, and applicability would be responsible for such differences. There is a need to provide empirical evidence and promote meta-research to answer these fundamental questions on the pragmatism of randomized trials and real-world evidence. Here, we describe the rationale and design of the PragMeta database which pursues this goal ( www.PragMeta.org ). METHODS: PragMeta is a non-commercial, open data platform and infrastructure to facilitate research on pragmatic trials. It collects and shares data from published randomized trials that either have a specific design feature or other characteristic related to pragmatism or they form clusters of trials addressing the same research question but having different aspects of pragmatism. This lays the foundation to determine the relationship of various features of pragmatism, generalizability, and applicability with intervention effects or other trial characteristics. The database contains trial data actively collected for PragMeta but also allows to import and link existing datasets of trials collected for other purposes, forming a large-scale meta-database. PragMeta captures data on (1) trial and design characteristics (e.g., sample size, population, intervention/comparison, outcome, longitudinal structure, blinding), (2) effects estimates, and (3) various determinants of pragmatism (e.g., the use of routinely collected data) and ratings from established tools used to determine pragmatism (e.g., the PRagmatic-Explanatory Continuum Indicator Summary 2; PRECIS-2). PragMeta is continuously provided online, inviting the meta-research community to collaborate, contribute, and/or use the database. As of April 2023, PragMeta contains data from > 700 trials, mostly with assessments on pragmatism. CONCLUSIONS: PragMeta will inform a better understanding of pragmatism and the generation and interpretation of real-world evidence.