RESUMO
PURPOSE: Breast density categorization consistency is important when performing research, and minimization of interoperator and intraoperator variability is essential. This research aimed to validate a set of mammography images for visual breast density estimation to achieve consistency in future research projects and to determine observer performance. METHODS: Using the Breast Imaging Reporting and Data System (BI-RADS) as the visual grading scale, 50 mammography images were scored for density grade by 8 observers. RESULTS: Six of 8 observers achieved near-complete intraobserver agreement (kappa > 0.81). Strong agreement among observers (kappa = 0.61-0.8) was found in 10 of 28 paired observation episodes on the first iteration and 12 of 28 on the second. No observers demonstrated a delta variance above 1. Fleiss' kappa was used to evaluate concordance among all observers on the first and second iterations (first iteration, 0.64; second iteration, 0.56). DISCUSSION: This research illustrates the difficulties of comparing observer visual performance scores because differences can exist when studies are repeated by and among individuals. CONCLUSION: We confirmed that the 50 images were suitable for research purposes. Some variability existed among observers; however, overall density classification agreement was strong. Future research should include repeating this study with digitally acquired images.