RESUMO
OBJECTIVES: To evaluate the impact of guidance and training on the inter-rater reliability (IRR), inter-consensus reliability (ICR) and evaluator burden of the Risk of Bias (RoB) in Non-randomized Studies (NRS) of Interventions (ROBINS-I) tool, and the RoB instrument for NRS of Exposures (ROB-NRSE). STUDY DESIGN AND SETTING: In a before-and-after study, seven reviewers appraised the RoB using ROBINS-I (n = 44) and ROB-NRSE (n = 44), before and after guidance and training. We used Gwet's AC1 statistic to calculate IRR and ICR. RESULTS: After guidance and training, the IRR and ICR of the overall bias domain of ROBINS-I and ROB-NRSE improved significantly; with many individual domains showing either a significant (IRR and ICR of ROB-NRSE; ICR of ROBINS-I), or nonsignificant improvement (IRR of ROBINS-I). Evaluator burden significantly decreased after guidance and training for ROBINS-I, whereas for ROB-NRSE there was a slight nonsignificant increase. CONCLUSION: Overall, there was benefit for guidance and training for both tools. We highly recommend guidance and training to reviewers prior to RoB assessments and that future research investigate aspects of guidance and training that are most effective.
Assuntos
Pesquisa Biomédica/normas , Projetos de Pesquisa Epidemiológica , Variações Dependentes do Observador , Revisão por Pares/normas , Projetos de Pesquisa/normas , Pesquisadores/educação , Adulto , Pesquisa Biomédica/estatística & dados numéricos , Canadá , Estudos Transversais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Psicometria/métodos , Reprodutibilidade dos Testes , Projetos de Pesquisa/estatística & dados numéricos , Reino UnidoRESUMO
BACKGROUND: A new tool, "risk of bias (ROB) instrument for non-randomized studies of exposures (ROB-NRSE)," was recently developed. It is important to establish consistency in its application and interpretation across review teams. In addition, it is important to understand if specialized training and guidance will improve the reliability in the results of the assessments. Therefore, the objective of this cross-sectional study is to establish the inter-rater reliability (IRR), inter-consensus reliability (ICR), and concurrent validity of the new ROB-NRSE tool. Furthermore, as this is a relatively new tool, it is important to understand the barriers to using this tool (e.g., time to conduct assessments and reach consensus-evaluator burden). METHODS: Reviewers from four participating centers will apprise the ROB of a sample of NRSE publications using ROB-NRSE tool in two stages. For IRR and ICR, two pairs of reviewers will assess the ROB for each NRSE publication. In the first stage, reviewers will assess the ROB without any formal guidance. In the second stage, reviewers will be provided customized training and guidance. At each stage, each pair of reviewers will resolve conflicts and arrive at a consensus. To calculate the IRR and ICR, we will use Gwet's AC1 statistic. For concurrent validity, reviewers will appraise a sample of NRSE publications using both the Newcastle-Ottawa Scale (NOS) and ROB-NRSE tool. We will analyze the concordance between the two tools for similar domains and for the overall judgments using Kendall's tau coefficient. To measure evaluator burden, we will assess the time taken to apply ROB-NRSE tool (without and with guidance), and the NOS. To assess the impact of customized training and guidance on the evaluator burden, we will use the generalized linear models. We will use Microsoft Excel and SAS 9.4, to manage and analyze study data, respectively. DISCUSSION: The quality of evidence from systematic reviews that include NRSE depends partly on the study-level ROB assessments. The findings of this study will contribute to an improved understanding of ROB-NRSE and how best to use it.
Assuntos
Viés , Consenso , Reprodutibilidade dos Testes , Projetos de Pesquisa , Estudos Transversais , HumanosRESUMO
OBJECTIVE: To assess the real-world interrater reliability (IRR), interconsensus reliability (ICR), and evaluator burden of the Risk of Bias (RoB) in Nonrandomized Studies (NRS) of Interventions (ROBINS-I), and the ROB Instrument for NRS of Exposures (ROB-NRSE) tools. STUDY DESIGN AND SETTING: A six-center cross-sectional study with seven reviewers (2 reviewer pairs) assessing the RoB using ROBINS-I (n = 44 NRS) or ROB-NRSE (n = 44 NRS). We used Gwet's AC1 statistic to calculate the IRR and ICR. To measure the evaluator burden, we assessed the total time taken to apply the tool and reach a consensus. RESULTS: For ROBINS-I, both IRR and ICR for individual domains ranged from poor to substantial agreement. IRR and ICR on overall RoB were poor. The evaluator burden was 48.45 min (95% CI 45.61 to 51.29). For ROB-NRSE, the IRR and ICR for the majority of domains were poor, while the rest ranged from fair to perfect agreement. IRR and ICR on overall RoB were slight and poor, respectively. The evaluator burden was 36.98 min (95% CI 34.80 to 39.16). CONCLUSIONS: We found both tools to have low reliability, although ROBINS-I was slightly higher. Measures to increase agreement between raters (e.g., detailed training, supportive guidance material) may improve reliability and decrease evaluator burden.