Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 532
Filter
1.
Eur Urol Open Sci ; 69: 89-99, 2024 Nov.
Article in English | MEDLINE | ID: mdl-39381595

ABSTRACT

Multiple randomized controlled trials (RCTs) have examined first-line pharmacological agents such as anticholinergics and ß3 agonists for the management of overactive bladder symptoms (OAB). Although earlier systematic reviews and (network) meta-analyses aimed to summarize the evidence, a substantial number of trials were not included, so a comprehensive and methodologically rigorous evaluation of the comparative effectiveness of all first-line pharmacological treatments is lacking. We aim to conduct a series of systematic reviews and network meta-analyses (NMAs) for a comprehensive assessment of the effectiveness and safety of first-line pharmacological treatments for OAB. Eligible studies will include RCTs comparing anticholinergics and ß3 agonists to one another or to placebo in adults with OAB or detrusor overactivity. Pairs of reviewers with methodological training will independently evaluate candidate studies to determine eligibility and extract relevant data. We will incorporate patient-important outcomes, including urinary urgency episodes, urgency incontinence episodes, any type of incontinence episodes, urinary frequency, nocturia, and adverse events. We will conduct the NMAs using a frequentist framework and a graph theory model for each outcome. Analysis will follow rigorous methodologies, including handling of missing data and assessment of the risk of bias. We will conduct sensitivity and subgroup analyses and will apply the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach to rate evidence certainty. Our approach aims to address the knowledge gap in the treatment of OAB by synthesizing evidence from RCTs worldwide. We will employ robust statistical methods, including frequentist NMA, to general clinically relevant and patient-important insights. Sensitivity and subgroup analyses will enhance the robustness and generalizability of our findings. Our reviews strive to inform evidence-based decisions in the management of OAB, to ultimate improve patient outcomes. Our study results may guide health policy decisions, such as reimbursement policies, and future studies in functional urology. The protocol for the review series is registered on PROSPERO as CRD42023266915.

2.
Res Integr Peer Rev ; 9(1): 11, 2024 Oct 07.
Article in English | MEDLINE | ID: mdl-39370503

ABSTRACT

BACKGROUND: Preprints are scientific articles that have not undergone the peer-review process. They allow the latest evidence to be rapidly shared, however it is unclear whether they can be confidently used for decision-making during a public health emergency. This study aimed to compare the data and quality of preprints released during the first four months of the 2022 mpox outbreak to their published versions. METHODS: Eligible preprints (n = 76) posted between May to August 2022 were identified through an established mpox literature database and followed to July 2024 for changes in publication status. Quality of preprints and published studies was assessed by two independent reviewers to evaluate changes in quality, using validated tools that were available for the study design (n = 33). Tools included the Newcastle-Ottawa Scale; Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2); and JBI Critical Appraisal Checklists. The questions in each tool led to an overall quality assessment of high quality (no concerns with study design, conduct, and/or analysis), moderate quality (minor concerns) or low quality (several concerns). Changes in data (e.g. methods, outcomes, results) for preprint-published pairs (n = 60) were assessed by one reviewer and verified by a second. RESULTS: Preprints and published versions that could be evaluated for quality (n = 25 pairs) were mostly assessed as low quality. Minimal to no change in quality from preprint to published was identified: all observational studies (10/10), most case series (6/7) and all surveillance data analyses (3/3) had no change in overall quality, while some diagnostic test accuracy studies (3/5) improved or worsened their quality assessment scores. Among all pairs (n = 60), outcomes were often added in the published version (58%) and less commonly removed (18%). Numerical results changed from preprint to published in 53% of studies, however most of these studies (22/32) had changes that were minor and did not impact main conclusions of the study. CONCLUSIONS: This study suggests the minimal changes in quality, results and main conclusions from preprint to published versions supports the use of preprints, and the use of the same critical evaluation tools on preprints as applied to published studies, in decision-making during a public health emergency.

3.
Int J Nurs Pract ; : e13302, 2024 Oct 10.
Article in English | MEDLINE | ID: mdl-39389100

ABSTRACT

AIM: To evaluate the percentage and reasons for disagreements in the risk of bias (RoB) assessments for randomized controlled trials (RCTs) included in more than one Cochrane review in the field of nursing. BACKGROUND: Disagreement in RoB assessments reduces the credibility of the evidence summarized by systematic reviews (SRs). There is no study that evaluates the reliability of RoB assessments in nursing studies. DESIGN: Secondary data analysis based on research reports. METHODS: RCTs included in more than one review in the nursing have been included. The disagreement of the assessment was analysed, and the possible reasons for disagreements were investigated. RESULTS: Twenty-three RCTs were included in more than one review. The agreement of assessment ranged from 36.84% for "selective reporting" to 91.30% for "random sequence generation". "Allocation concealment" showed the optimal agreement (84.21%). The items "blinding of participants and personnel", "blinding of outcome assessment" and "incomplete outcome data" showed poor agreement, with 50.00%, 58.82% and 66.67%, respectively. Most disagreements came from extracting incomplete or different RCTs' information. CONCLUSIONS: The level of agreement of the assessment between reviews has varied greatly in the field of nursing. More complete and accurate information of RCTs needs to be collected when conducting a SR.

4.
Res Synth Methods ; 2024 Sep 26.
Article in English | MEDLINE | ID: mdl-39327803

ABSTRACT

RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two different approaches: (1) manually by human reviewers, and (2) automatically by the RobotReviewer. The manual assessment was based on two groups independently, with two additional rounds of verification. The agreement between RobotReviewer and humans was measured via the concordance rate and Cohen's kappa statistics, based on the comparison of binary classification of the risk of bias (low vs. high/unclear) as restricted by RobotReviewer. The concordance rates varied by domain, ranging from 63.07% to 83.32%. Cohen's kappa statistics showed a poor agreement between humans and RobotReviewer for allocation concealment (κ = 0.25, 95% CI: 0.21-0.30), blinding of outcome assessors (κ = 0.27, 95% CI: 0.23-0.31); While moderate for random sequence generation (κ = 0.46, 95% CI: 0.41-0.50) and blinding of participants and personnel (κ = 0.59, 95% CI: 0.55-0.64). The findings demonstrate that there were domain-specific differences in the level of agreement between RobotReviewer and humans. We suggest that it might be a useful auxiliary tool, but the specific manner of its integration as a complementary tool requires further discussion.

5.
Cancer Control ; 31: 10732748241286749, 2024.
Article in English | MEDLINE | ID: mdl-39307562

ABSTRACT

PURPOSE: This study enhances the efficiency of predicting complications in lung cancer patients receiving proton therapy by utilizing large language models (LLMs) and meta-analytical techniques for literature quality assessment. MATERIALS AND METHODS: We integrated systematic reviews with LLM evaluations, sourcing studies from Web of Science, PubMed, and Scopus, managed via EndNote X20. Inclusion and exclusion criteria ensured literature relevance. Techniques included meta-analysis, heterogeneity assessment using Cochran's Q test and I2 statistics, and subgroup analyses for different complications. Quality and bias risk were assessed using the PROBAST tool and further analyzed with models such as ChatGPT-4, Llama2-13b, and Llama3-8b. Evaluation metrics included AUC, accuracy, precision, recall, F1 score, and time efficiency (WPM). RESULTS: The meta-analysis revealed an overall effect size of 0.78 for model predictions, with high heterogeneity observed (I2 = 72.88%, P < 0.001). Subgroup analysis for radiation-induced esophagitis and pneumonitis revealed predictive effect sizes of 0.79 and 0.77, respectively, with a heterogeneity index (I2) of 0%, indicating that there were no significant differences among the models in predicting these specific complications. A literature assessment using LLMs demonstrated that ChatGPT-4 achieved the highest accuracy at 90%, significantly outperforming the Llama3 and Llama2 models, which had accuracies ranging from 44% to 62%. Additionally, LLM evaluations were conducted 3229 times faster than manual assessments were, markedly enhancing both efficiency and accuracy. The risk assessment results identified nine studies as high risk, three as low risk, and one as unknown, confirming the robustness of the ChatGPT-4 across various evaluation metrics. CONCLUSION: This study demonstrated that the integration of large language models with meta-analysis techniques can significantly increase the efficiency of literature evaluations and reduce the time required for assessments, confirming that there are no significant differences among models in predicting post proton therapy complications in lung cancer patients.


Using Advanced AI to Improve Predictions of Treatment Side Effects in Lung Cancer: This research uses cutting-edge artificial intelligence (AI) techniques, including large language models like ChatGPT-4, to better predict potential side effects in lung cancer patients undergoing proton therapy. By analyzing extensive scientific literature quickly and accurately, this approach has proven to enhance the evaluation process, making it faster and more reliable in foreseeing complications from treatments.


Subject(s)
Lung Neoplasms , Proton Therapy , Humans , Lung Neoplasms/radiotherapy , Proton Therapy/adverse effects , Proton Therapy/methods
6.
Environ Evid ; 13(1): 1, 2024 Feb 07.
Article in English | MEDLINE | ID: mdl-39294842

ABSTRACT

To inform environmental policy and practice, researchers estimate effects of interventions/exposures by conducting primary research (e.g., impact evaluations) or secondary research (e.g., evidence reviews). If these estimates are derived from poorly conducted/reported research, then they could misinform policy and practice by providing biased estimates. Many types of bias have been described, especially in health and medical sciences. We aimed to map all types of bias from the literature that are relevant to estimating causal effects in the environmental sector. All the types of bias were initially identified by using the Catalogue of Bias (catalogofbias.org) and reviewing key publications (n = 11) that previously collated and described biases. We identified 121 (out of 206) types of bias that were relevant to estimating causal effects in the environmental sector. We provide a general interpretation of every relevant type of bias covered by seven risk-of-bias domains for primary research: risk of confounding biases; risk of post-intervention/exposure selection biases; risk of misclassified/mismeasured comparison biases; risk of performance biases; risk of detection biases; risk of outcome reporting biases; risk of outcome assessment biases, and four domains for secondary research: risk of searching biases; risk of screening biases; risk of study appraisal and data coding/extraction biases; risk of data synthesis biases. Our collation should help scientists and decision makers in the environmental sector be better aware of the nature of bias in estimation of causal effects. Future research is needed to formalise the definitions of the collated types of bias such as through decomposition using mathematical formulae.

7.
BMC Med Res Methodol ; 24(1): 219, 2024 Sep 27.
Article in English | MEDLINE | ID: mdl-39333867

ABSTRACT

BACKGROUND: There is a growing trend to include non-randomised studies of interventions (NRSIs) in rare events meta-analyses of randomised controlled trials (RCTs) to complement the evidence from the latter. An important consideration when combining RCTs and NRSIs is how to address potential bias and down-weighting of NRSIs in the pooled estimates. The aim of this study is to explore the use of a power prior approach in a Bayesian framework for integrating RCTs and NRSIs to assess the effect of rare events. METHODS: We proposed a method of specifying the down-weighting factor based on judgments of the relative magnitude (no information, and low, moderate, serious and critical risk of bias) of the overall risk of bias for each NRSI using the ROBINS-I tool. The methods were illustrated using two meta-analyses, with particular interest in the risk of diabetic ketoacidosis (DKA) in patients using sodium/glucose cotransporter-2 (SGLT-2) inhibitors compared with active comparators, and the association between low-dose methotrexate exposure and melanoma. RESULTS: No significant results were observed for these two analyses when the data from RCTs only were pooled (risk of DKA: OR = 0.82, 95% confidence interval (CI): 0.25-2.69; risk of melanoma: OR = 1.94, 95%CI: 0.72-5.27). When RCTs and NRSIs were directly combined without distinction in the same meta-analysis, both meta-analyses showed significant results (risk of DKA: OR = 1.50, 95%CI: 1.11-2.03; risk of melanoma: OR = 1.16, 95%CI: 1.08-1.24). Using Bayesian analysis to account for NRSI bias, there was a 90% probability of an increased risk of DKA in users receiving SGLT-2 inhibitors and an 91% probability of an increased risk of melanoma in patients using low-dose methotrexate. CONCLUSIONS: Our study showed that including NRSIs in a meta-analysis of RCTs for rare events could increase the certainty and comprehensiveness of the evidence. The estimates obtained from NRSIs are generally considered to be biased, and the possible influence of NRSIs on the certainty of the combined evidence needs to be carefully investigated.


Subject(s)
Bayes Theorem , Meta-Analysis as Topic , Randomized Controlled Trials as Topic , Sodium-Glucose Transporter 2 Inhibitors , Humans , Randomized Controlled Trials as Topic/methods , Randomized Controlled Trials as Topic/statistics & numerical data , Sodium-Glucose Transporter 2 Inhibitors/therapeutic use , Sodium-Glucose Transporter 2 Inhibitors/adverse effects , Methotrexate/therapeutic use , Diabetic Ketoacidosis/chemically induced , Melanoma/drug therapy
8.
Syst Rev ; 13(1): 230, 2024 Sep 07.
Article in English | MEDLINE | ID: mdl-39244603

ABSTRACT

While undisputedly important, and part of any systematic review (SR) by definition, evaluation of the risk of bias within the included studies is one of the most time-consuming parts of performing an SR. In this paper, we describe a case study comprising an extensive analysis of risk of bias (RoB) and reporting quality (RQ) assessment from a previously published review (CRD42021236047). It included both animal and human studies, and the included studies compared baseline diseased subjects with controls, assessed the effects of investigational treatments, or both. We compared RoB and RQ between the different types of included primary studies. We also assessed the "informative value" of each of the separate elements for meta-researchers, based on the notion that variation in reporting may be more interesting for the meta-researcher than consistently high/low or reported/non-reported scores. In general, reporting of experimental details was low. This resulted in frequent unclear risk-of-bias scores. We observed this both for animal and for human studies and both for disease-control comparisons and investigations of experimental treatments. Plots and explorative chi-square tests showed that reporting was slightly better for human studies of investigational treatments than for the other study types. With the evidence reported as is, risk-of-bias assessments for systematic reviews have low informative value other than repeatedly showing that reporting of experimental details needs to improve in all kinds of in vivo research. Particularly for reviews that do not directly inform treatment decisions, it could be efficient to perform a thorough but partial assessment of the quality of the included studies, either of a random subset of the included publications or of a subset of relatively informative elements, comprising, e.g. ethics evaluation, conflicts of interest statements, study limitations, baseline characteristics, and the unit of analysis. This publication suggests several potential procedures.


Subject(s)
Bias , Research Design , Systematic Reviews as Topic , Humans , Animals
9.
BMC Med ; 22(1): 374, 2024 Sep 11.
Article in English | MEDLINE | ID: mdl-39256834

ABSTRACT

BACKGROUND: Genome-wide association studies have enabled Mendelian randomization analyses to be performed at an industrial scale. Two-sample summary data Mendelian randomization analyses can be performed using publicly available data by anyone who has access to the internet. While this has led to many insightful papers, it has also fuelled an explosion of poor-quality Mendelian randomization publications, which threatens to undermine the credibility of the whole approach. FINDINGS: We detail five pitfalls in conducting a reliable Mendelian randomization investigation: (1) inappropriate research question, (2) inappropriate choice of variants as instruments, (3) insufficient interrogation of findings, (4) inappropriate interpretation of findings, and (5) lack of engagement with previous work. We have provided a brief checklist of key points to consider when performing a Mendelian randomization investigation; this does not replace previous guidance, but highlights critical analysis choices. Journal editors should be able to identify many low-quality submissions and reject papers without requiring peer review. Peer reviewers should focus initially on key indicators of validity; if a paper does not satisfy these, then the paper may be meaningless even if it is technically flawless. CONCLUSIONS: Performing an informative Mendelian randomization investigation requires critical thought and collaboration between different specialties and fields of research.


Subject(s)
Mendelian Randomization Analysis , Mendelian Randomization Analysis/methods , Humans , Genome-Wide Association Study/methods
10.
J Clin Epidemiol ; 174: 111489, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39089422

ABSTRACT

OBJECTIVES: The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, first published in 2009, has been widely endorsed and compliance is high in systematic reviews (SRs) of intervention studies. SRs of prevalence studies are increasing in frequency, but their characteristics and reporting quality have not been examined in large studies. Our objectives were to describe the characteristics of SRs of prevalence studies in adults, evaluate the completeness of reporting, and explore study-level characteristics associated with the completeness of reporting. STUDY DESIGN AND SETTING: We did a metaresearch study. We searched 5 databases from January 2010 to December 2020 to identify SRs of prevalence studies in adult populations. We used the PRISMA 2009 checklist to assess completeness of reporting and recorded additional characteristics. We conducted a descriptive analysis of review characteristics and linear regression to assess the relationship between compliance with PRISMA and publication characteristics. RESULTS: We included 1172 SRs of prevalence studies. The number of reviews increased from 25 in 2010 to 273 in 2020. The median PRISMA score for SRs without meta-analysis was 17.5 of a maximum of 23, and for SRs with meta-analysis, 22 of a maximum of 25. Completeness of reporting, particularly for key items in the methods section, was suboptimal. SRs that included a meta-analysis or reported using a reporting or conduct guideline were the factors most strongly associated with increased compliance with PRISMA 2009. CONCLUSION: Reporting of SRs of prevalence was adequate for many PRISMA items. Nonetheless, this study highlights aspects for which special attention is needed. Development of a specific tool to assess the risk of bias in prevalence studies and an extension to the PRISMA statement could improve the conduct and reporting of SRs of prevalence studies.

11.
Nutrients ; 16(16)2024 Aug 17.
Article in English | MEDLINE | ID: mdl-39203880

ABSTRACT

BACKGROUND: The Foods with Function Claim was introduced in Japan in April 2015 to make more products available that are labeled with health functions. A product's functionality of function claims must be explained by the scientific evidence presented in clinical trials (CTs) or systematic reviews, but the quality of recent CTs is unclear. The purpose of this study was to evaluate the risk of bias (RoB) using "a revised tool to assess risk (RoB 2)" published in 2018 for notifications based on all recent CTs published on the Consumer Affairs Agency website. METHODS: A total of 38 submitted papers based on CTs that were published on the Consumer Affairs Agency website during the period from 1 January 2023 to 30 June 2024 were eligible. The RoB 2 tool provides a framework for considering the risk of bias in the findings of any type of randomized trial. This tool with five domains was used to evaluate the quality of research methods. RESULTS: Eligible CTs were assessed as "low risk" (11%, n = 4), "medium risk" (13%, n = 5), and "high risk" (76%, n = 29). A number of highly biased papers were published. Bias occurred in all five domains, especially "bias in selection of the reported result (Domain 5)", which was the most serious ("high risk"; 75%). For elements correlated with RoB, there was no significant difference (p = 0.785) in the RoB 2 score between for-profit and academic research in the author's affiliated organization. There was no significant difference (p = 0.498) in the RoB score between the published year categories of 2000-2019 and 2020-2024, and no significant difference (p = 0.643) in the RoB score between English and Japanese language publications. CONCLUSION: Overall, the quality of the latest CTs submitted after 2023 was very low, occurring in all five domains, and was most serious for "bias in selection of the reported result (Domain 5)".


Subject(s)
Bias , Randomized Controlled Trials as Topic , Japan , Humans , Functional Food , Food Labeling/methods , Research Design , Risk Assessment/methods
12.
J Evid Based Med ; 17(3): 550-558, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39107946

ABSTRACT

OBJECTIVE: An important consideration when combining RCTs and NRSIs is how to address their potential biases in the pooled estimates. This study aimed to propose a Bayesian bias-adjusted random effects model for the synthesis of evidence from RCTs and NRSIs. METHODS: We present a Bayesian bias-adjusted random effects model based on power prior method, which combines the likelihood contribution of the NRSIs, raised to the power parameter of alpha, with the likelihood of the RCT data, modeled with an additive bias. The method was illustrated using a meta-analysis on the association between low-dose methotrexate exposure and melanoma. We also combined RCTs and NRSIs using the naïve data synthesis. RESULTS: The results including only RCTs has a posterior median and 95% credible interval (CrI) of 1.18 (0.31-4.04), the posterior probability of any harm (> 1.0) and a meaningful association (> 1.15) were 0.61 and 0.52, respectively. The posterior median and 95% CrI based on the naïve data synthesis resulted in 1.17 (0.96-1.47), and the posterior probability of any harm and a meaningful association were 0.96 and 0.60, respectively. For the Bayesian bias-adjusted analysis, the median OR was 1.16 (95% CrI: 0.83-1.71), and the posterior probabilities of any and a meaningful clinical association were 0.88 and 0.53, respectively. CONCLUSIONS: The results indicated that integrating NRSIs into meta-analysis could increase the certainty of the body of evidence. However, directly combining RCTs and NRSIs in the same meta-analysis without distinction may lead to misleading conclusions.


Subject(s)
Bayes Theorem , Bias , Randomized Controlled Trials as Topic , Humans , Melanoma/drug therapy , Methotrexate/therapeutic use , Methotrexate/administration & dosage , Models, Statistical , Meta-Analysis as Topic
14.
BMC Med Res Methodol ; 24(1): 169, 2024 Aug 05.
Article in English | MEDLINE | ID: mdl-39103781

ABSTRACT

BACKGROUND: Although aggregate data (AD) from randomised clinical trials (RCTs) are used in the majority of network meta-analyses (NMAs), other study designs (e.g., cohort studies and other non-randomised studies, NRS) can be informative about relative treatment effects. The individual participant data (IPD) of the study, when available, are preferred to AD for adjusting for important participant characteristics and to better handle heterogeneity and inconsistency in the network. RESULTS: We developed the R package crossnma to perform cross-format (IPD and AD) and cross-design (RCT and NRS) NMA and network meta-regression (NMR). The models are implemented as Bayesian three-level hierarchical models using Just Another Gibbs Sampler (JAGS) software within the R environment. The R package crossnma includes functions to automatically create the JAGS model, reformat the data (based on user input), assess convergence and summarize the results. We demonstrate the workflow within crossnma by using a network of six trials comparing four treatments. CONCLUSIONS: The R package crossnma enables the user to perform NMA and NMR with different data types in a Bayesian framework and facilitates the inclusion of all types of evidence recognising differences in risk of bias.


Subject(s)
Bayes Theorem , Network Meta-Analysis , Software , Humans , Randomized Controlled Trials as Topic/methods , Randomized Controlled Trials as Topic/statistics & numerical data , Research Design , Algorithms , Meta-Analysis as Topic
15.
Cureus ; 16(7): e63581, 2024 Jul.
Article in English | MEDLINE | ID: mdl-39087151

ABSTRACT

Our study aimed to establish the risk of selection bias in randomized controlled trials (RCT) that were overall rated as having "low bias" risk according to Cochrane's Risk of Bias, version 2 (RoB 2) tool. A systematic literature search of current systematic reviews of RCTs was conducted. From the identified reviews, RCTs with overall "high bias" and "low bias" RoB 2 risk ratings were extracted. All RCTs were statistically tested for selection bias risk. From the test results, true positive, true negative, false positive, or false negative ratings were established, and the false omission rate (FOR) with a 95% confidence interval (CI) was computed. Subgroup analysis was conducted by computing the negative likelihood ratio (-LR) concerning RoB 2 domain 1 ratings: bias arising from the randomization process. A total of 1070 published RCTs (median publication year: 2018; interquartile range: 2013-2020) were identified and tested. We found that 7.61% of all "low bias" (RoB 2)-rated RCTs were of high selection bias risk (FOR 7.61%; 95% CI: 6.31%-9.14%) and that the likelihood for high selection bias risk in "low bias" (RoB 2 domain 1)-rated RCTs was 6% higher than that for low selection bias risk (-LR: 1.06; 95% CI: 0.98-1.15). These findings raise issues about the validity of "low bias" risk ratings using Cochrane's RoB 2 tool as well as about the validity of some of the results from recently published RCTs. Our results also suggest that the likelihood of a "low bias" risk-rated body of clinical evidence being actually bias-free is low, and that generalization based on a limited, pre-specified set of appraisal criteria may not justify a high level of confidence that such evidence reflects the true treatment effect.

16.
Res Synth Methods ; 2024 Aug 23.
Article in English | MEDLINE | ID: mdl-39176994

ABSTRACT

Existing systems for automating the assessment of risk-of-bias (RoB) in medical studies are supervised approaches that require substantial training data to work well. However, recent revisions to RoB guidelines have resulted in a scarcity of available training data. In this study, we investigate the effectiveness of generative large language models (LLMs) for assessing RoB. Their application requires little or no training data and, if successful, could serve as a valuable tool to assist human experts during the construction of systematic reviews. Following Cochrane's latest guidelines (RoB2) designed for human reviewers, we prepare instructions that are fed as input to LLMs, which then infer the risk associated with a trial publication. We distinguish between two modelling tasks: directly predicting RoB2 from text; and employing decomposition, in which a RoB2 decision is made after the LLM responds to a series of signalling questions. We curate new testing data sets and evaluate the performance of four general- and medical-domain LLMs. The results fall short of expectations, with LLMs seldom surpassing trivial baselines. On the direct RoB2 prediction test set (n = 5993), LLMs perform akin to the baselines (F1: 0.1-0.2). In the decomposition task setup (n = 28,150), similar F1 scores are observed. Our additional comparative evaluation on RoB1 data also reveals results substantially below those of a supervised system. This testifies to the difficulty of solving this task based on (complex) instructions alone. Using LLMs as an assisting technology for assessing RoB2 thus currently seems beyond their reach.

17.
Toxicol Sci ; 201(2): 240-253, 2024 Oct 01.
Article in English | MEDLINE | ID: mdl-38964352

ABSTRACT

To support the development of appraisal tools for assessing the quality of in vitro studies, we developed a method for literature-based discovery of study assessment criteria, used the method to create an item bank of assessment criteria of potential relevance to in vitro studies, and analyzed the item bank to discern and critique current approaches for appraisal of in vitro studies. We searched four research indexes and included any document that identified itself as an appraisal tool for in vitro studies, was a systematic review that included a critical appraisal step, or was a reporting checklist for in vitro studies. We abstracted, normalized, and categorized all criteria applied by the included appraisal tools to create an "item bank" database of issues relevant to the assessment of in vitro studies. The resulting item bank consists of 676 unique appraisal concepts from 67 appraisal tools. We believe this item bank is the single most comprehensive resource of its type to date, should be of high utility for future tool development exercises, and provides a robust methodology for grounding tool development in the existing literature. Although we set out to develop an item bank specifically targeting in vitro studies, we found that many of the assessment concepts we discovered are readily applicable to other study designs. Item banks can be of significant value as a resource; however, there are important challenges in developing, maintaining, and extending them of which researchers should be aware.


Subject(s)
Research Design , Humans , In Vitro Techniques , Databases, Factual , Animals
18.
J Clin Epidemiol ; 174: 111480, 2024 Jul 23.
Article in English | MEDLINE | ID: mdl-39047919

ABSTRACT

OBJECTIVES: Current standards for systematic reviews (SRs) require adequate conduct and complete reporting of risk of bias (RoB) assessments of the individual studies included in the review. We investigated the conduct and reporting of RoB assessments reported in a sample of SRs of interventions for persons with cerebral palsy (CP). STUDY DESIGN AND SETTING: We included SRs published from 2014 to 2021. Authors worked in pairs to independently extract data on the characteristics of the SRs and to rate their conduct and reporting. The conduct of RoB assessment was appraised with the three AMSTAR-2 items related to RoB assessment. Reporting completeness was evaluated using the two items related to RoB assessment within studies in the PRISMA 2020 guidelines. We use descriptive statistics to report the consensus data, in accordance with our protocol. RESULTS: We included 145 SRs. Among the 128 (88.3%) SRs that assessed RoB, the standards for AMSTAR-2 item 9 (use of an adequate RoB tool) were partially or fully satisfied in 73 (57.0%). Across the 128 SRs that assessed RoB, 46 (35.9%) accounted for RoB in interpreting the SR's findings and, of the 49 that included a meta-analysis, 11 (22.4%) discussed the impact of RoB on this. 123 (96.1%) of the 128 SRs named the RoB tool that was used for at least one of the study designs they included, 96 (75.0%) specified the RoB items assessed and 89 (69.5%) reported the findings for each item, 81 (63.2%) fully reported the processes for RoB assessment, 68 (53.1%) reported how an overall RoB judgment was reached, and 74 (57.8%) reported an overall RoB assessment for every study. CONCLUSION: The selection and application of RoB tools in this sample of SRs about interventions for CP are comparable to those reported in other recent studies. However, most SRs in this sample did not fully meet the appraisal standards of AMSTAR-2 regarding the adequacy of the RoB tool applied and other aspects of RoB assessment conduct; Cochrane SRs were a notable exception. Overall, reporting of RoB assessments was somewhat better than conduct, perhaps reflecting the more widespread uptake of the PRISMA guidelines. Our findings may be generalizable to some extent, considering the extensive literature reporting widespread inadequacies in health care-related intervention SRs and reports from other specialties that document similar RoB assessment deficiencies. As such, this study should remind authors, peer reviewers, and journal editors to follow the RoB assessment reporting guidelines of PRISMA 2020 and to understand the corresponding critical appraisal standards of AMSTAR-2. We recommend a shift of focus from the documentation of inadequate RoB assessments and well-known deficiencies in other components of SRs towards the implementation of changes to address these problems along with plans to evaluate their effectiveness.

19.
J Clin Epidemiol ; 174: 111460, 2024 Jul 16.
Article in English | MEDLINE | ID: mdl-39025376

ABSTRACT

OBJECTIVES: Risk of bias (RoB) assessment is a critical part of any systematic review (SR). There are multiple tools available for assessing RoB of the studies included in a SR. The conduct of these assessments in intervention SRs are addressed by three items in AMSTAR-2, considered the preferred tool for critically appraising an intervention SR. This study focuses attention on item 9, which assesses the ability of a RoB tool to adequately address sources of bias, particularly in randomized trials (RCTs) and nonrandomized studies of interventions (NRSI). Our main objective is to report the detailed results of our examination of both Cochrane and non-Cochrane RoB tools and distinguish those that meet AMSTAR-2 item 9 appraisal standards. STUDY DESIGN AND SETTING: We identified critical appraisal tools reported in a sample of 126 SRs reporting on interventions for persons with cerebral palsy published from 2014 to 2021. Eligible tools were those that had been used to assess the primary studies included in these SRs and for which assessment results were reported in enough detail to allow appraisal of the tool. We identified the version of the tool applied as original, modified, or novel and established the applicable study designs as intended by the tools' developers. We then evaluated the potential ability of these tools to assess the four sources of bias specified in AMSTAR-2 item 9 for RCTs and NRSI. We adapted item 9 to appraise tools applied to single-case experimental designs, which we also encountered in this sample of SRs. RESULTS: Most of the eligible tools are recognized by name in the published literature and were applied in the original or modified form. Modifications were applied with considerable variability across the sample. Of the 37 tools we examined, those judged to fully meet the appraisal standards for RCTs included all the Cochrane tools, the original and modified Downs and Black Checklist, and the quality assessment standard for a cross-over study by Ding et al; for NRSI, these included all the Cochrane tools, the original and modified Downs and Black Checklist, and the Research Triangle Institute item bank on Risk of Bias and Precision of Observational Studies for NRSI. In general, tools developed for a specific study design were judged to meet the appraisal standards fully or partially for that design. These results suggest it is unlikely that a single tool will be adequate by AMSTAR-2 item 9 appraisal standards for an intervention SR that includes studies of various designs. CONCLUSION: To our knowledge, this is the first resource providing SR authors with practical information about the appropriateness and adequacy of RoB tools by the appraisal standards specified in AMSTAR-2 item 9 for RCTs and NRSI. We propose similar methods for appraisal of tools applied to single-case experimental design. We encourage authors to seek contemporary RoB tools developed for use in healthcare-related intervention SRs and designed to evaluate relevant study design features. The tools should address attributes unique to the review topic and research question but not be subjected to unjustified and excessive modifications. We promote recognition of the potential shortcomings of both Cochrane and non-Cochrane RoB tools, even those that perform well by AMSTAR-2 item 9 appraisal standards.

20.
J Clin Epidemiol ; 174: 111486, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39084579

ABSTRACT

An assessment of the validity of studies is an essential component of most evidence syntheses (systematic reviews) to understand the risk of bias (ROB) and applicability of the evidence. A formal validity assessment requires a structured and comprehensive approach, which can be implemented using an assessment tool, specifically developed for this purpose. Many different tools are available, marking it difficult for researchers to choose the best tool for their evidence synthesis. We have established the LATITUDES Network to assist researchers in identifying the most appropriate tool to use in their evidence synthesis and to support researchers using these tools. The LATITUDES website (www.latitudes-network.org) includes a searchable library of validity assessment tools designed for use in evidence syntheses, bringing tools together in one place and providing researchers with clear information on suitable tools, categorized by study design. The website also provides links to training on the process of validity assessment and a list of tools currently under development. To be included in the LATITUDES library, tools must meet the following criteria: be designed for use in evidence syntheses; assess multidimensional aspects of validity of individual studies or reviews; and be developed for use by the wider research community rather than for a single research group. We highlight 'key' tools, those that are considered to be the most robust and reliable tools based on prespecified criteria agreed in conjunction with our advisory board, an international group of experts in the area of evidence synthesis and ROB tools.

SELECTION OF CITATIONS
SEARCH DETAIL