RESUMO
PURPOSE: The aim of this study was to determine variability in visually assessed mammographic breast density categorization among radiologists practicing in Indonesia, the Netherlands, South Africa, and the United States. METHODS: Two hundred consecutive 2-D full-field digital screening mammograms obtained from September to December 2017 were selected and retrospectively reviewed from four global locations, for a total of 800 mammograms. Three breast radiologists in each location (team) provided consensus density assessments of all 800 mammograms using BI-RADS® density categorization. Interreader agreement was compared using Gwet's AC2 with quadratic weighting across all four density categories and Gwet's AC1 for binary comparison of combined not dense versus dense categories. Variability of distribution among teams was calculated using the Stuart-Maxwell test of marginal homogeneity across all four categories and using the McNemar test for not dense versus dense categories. To compare readers from a particular country on their own 200 mammograms versus the other three teams, density distribution was calculated using conditional logistic regression. RESULTS: For all 800 mammograms, interreader weighted agreement for distribution among four density categories was 0.86 (Gwet's AC2 with quadratic weighting; 95% confidence interval, 0.85-0.88), and for not dense versus dense categories, it was 0.66 (Gwet's AC1; 95% confidence interval, 0.63-0.70). Density distribution across four density categories was significantly different when teams were compared with one another and one team versus the other three teams combined (P < .001). Overall, all readers placed the largest number of mammograms in the scattered and heterogeneous categories. CONCLUSIONS: Although reader teams from four different global locations had almost perfect interreader agreement in BI-RADS density categorization, variability in density distribution across four categories remained statistically significant.