Pathologists should probably forget about kappa. Percent agreement, diagnostic specificity and related metrics provide more clinically applicable measures of interobserver variability.

Marchevsky, Alberto M; Walts, Ann E; Lissenberg-Witte, Birgit I; Thunnissen, Erik

Marchevsky, Alberto M; Walts, Ann E; Lissenberg-Witte, Birgit I; Thunnissen, Erik.

Afiliação

Marchevsky AM; Department of Pathology & Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States of America. Electronic address: Alberto.Marchevsky@cshs.org.
Walts AE; Department of Pathology & Laboratory Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States of America.
Lissenberg-Witte BI; Department of Epidemiology and Data Science, UMC, Vrije Universiteit Amsterdam, the Netherlands.
Thunnissen E; Department of Pathology, UMC, Vrije Universiteit Amsterdam, the Netherlands.

Ann Diagn Pathol ; 47: 151561, 2020 Aug.

Article em En | MEDLINE | ID: mdl-32623312

RESUMO

Kappa statistics have been widely used in the pathology literature to compare interobserver diagnostic variability (IOV) among different pathologists but there has been limited discussion about the clinical significance of kappa scores. Five representative and recent pathology papers were queried using clinically relevant specific questions to learn how IOV was evaluated and how the clinical applicability of results was interpreted. The papers supported our anecdotal impression that pathologists usually assess IOV using Cohen's or Fleiss' kappa statistics and interpret the results using some variation of the scale proposed by Landis and Koch. The papers did not cite or propose specific guidelines to comment on the clinical applicability of results. The solutions proposed to decrease IOV included the development of better diagnostic criteria and additional educational efforts, but the possibility that the entities themselves represented a continuum of morphologic findings rather than distinct diagnostic categories was not considered in any of the studies. A dataset from a previous study of IOV reported by Thunnissen et al. was recalculated to estimate percent agreement among 19 international lung pathologists for the diagnosis of 74 challenging lung neuroendocrine neoplasms. Kappa scores and diagnostic sensitivity, specificity, positive and negative predictive values were calculated using the majority consensus diagnosis for each case as the gold reference diagnosis for that case. Diagnostic specificity estimates among multiple pathologists were > 90%, although kappa scores were considerably more variable. We explain why kappa scores are of limited clinical applicability in pathology and propose the use of positive and negative percent agreement and diagnostic specificity against a gold reference diagnosis to evaluate IOV among two and multiple raters, respectively.

Assuntos

Benchmarking/estatística & dados numéricos; Técnicas e Procedimentos Diagnósticos/estatística & dados numéricos; Pulmão/patologia; Tumores Neuroendócrinos/diagnóstico; Patologistas/normas; Benchmarking/métodos; Consenso; Técnicas e Procedimentos Diagnósticos/tendências; Medicina Baseada em Evidências/métodos; Humanos; Variações Dependentes do Observador; Patologia/normas; Valor Preditivo dos Testes; Projetos de Pesquisa/tendências; Sensibilidade e Especificidade; Estatística como Assunto

Palavras-chave

Diagnostic accuracy; Evidence-based pathology; Interobserver variability; Kappa statistics

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Tumores Neuroendócrinos / Benchmarking / Técnicas e Procedimentos Diagnósticos / Patologistas / Pulmão Tipo de estudo: Diagnostic_studies / Guideline / Prognostic_studies / Qualitative_research Limite: Humans Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google