Achieving high inter-rater reliability in establishing data labels: a retrospective chart review study.

Wu, Guosong; Eastwood, Cathy; Sapiro, Natalie; Cheligeer, Cheligeer; Southern, Danielle A; Quan, Hude; Xu, Yuan

Wu, Guosong; Eastwood, Cathy; Sapiro, Natalie; Cheligeer, Cheligeer; Southern, Danielle A; Quan, Hude; Xu, Yuan.

Afiliação

Wu G; Centre for Health Informatics, Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.
Eastwood C; Centre for Health Informatics, Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.
Sapiro N; Centre for Health Informatics, Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.
Cheligeer C; Alberta Health Services, Calgary, Alberta, Canada.
Southern DA; Centre for Health Informatics, Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.
Quan H; Centre for Health Informatics, Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada.
Xu Y; Centre for Health Informatics, Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada yuxu@ucalgary.ca.

BMJ Open Qual ; 13(2)2024 Apr 17.

Article em En | MEDLINE | ID: mdl-38631818

ABSTRACT

ABSTRACT

BACKGROUND:

In medical research, the effectiveness of machine learning algorithms depends heavily on the accuracy of labeled data. This study aimed to assess inter-rater reliability (IRR) in a retrospective electronic medical chart review to create high quality labeled data on comorbidities and adverse events (AEs).

METHODS:

Six registered nurses with diverse clinical backgrounds reviewed patient charts, extracted data on 20 predefined comorbidities and 18 AEs. All reviewers underwent four iterative rounds of training aimed to enhance accuracy and foster consensus. Periodic monitoring was conducted at the beginning, middle, and end of the testing phase to ensure data quality. Weighted Kappa coefficients were calculated with their associated 95% confidence intervals (CIs).

RESULTS:

Seventy patient charts were reviewed. The overall agreement, measured by Conger's Kappa, was 0.80 (95% CI 0.78-0.82). IRR scores remained consistently high (ranging from 0.70 to 0.87) throughout each phase.

CONCLUSION:

Our study suggests the detailed manual for chart review and structured training regimen resulted in a consistently high level of agreement among our reviewers during the chart review process. This establishes a robust foundation for generating high-quality labeled data, thereby enhancing the potential for developing accurate machine learning algorithms.

Assuntos

Confiabilidade dos Dados; Humanos; Reprodutibilidade dos Testes; Estudos Retrospectivos; Consenso

Palavras-chave

Adverse events, epidemiology and detection; Chart review methodologies; Patient safety

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Confiabilidade dos Dados Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Confiabilidade dos Dados Idioma: En Ano de publicação: 2024 Tipo de documento: Article