RESUMO
BACKGROUND: Visual assessment of mammographic breast composition remains the most common worldwide, although subjective variability limits its reproducibility. This study aimed to investigate the inter- and intra-observer variability in qualitative visual assessment of mammographic breast composition through a multi-institutional observer performance study for the first time in Japan. METHODS: This study enrolled 10 Japanese physicians from five different institutions. They used the new Japanese breast-composition classification system 4th edition to subjectively evaluate the breast composition in 200 pairs of right and left normal mediolateral oblique mammograms (number determined using precise sample size calculations) twice, with a 1-month interval (median patient age: 59 years [range 40-69 years]). The primary endpoint of this study was the inter-observer variability using kappa (κ) value. RESULTS: Inter-observer variability for the four and two classes of breast-composition assessment revealed moderate agreement (Fleiss' κ: first and second reading = 0.553 and 0.587, respectively) and substantial agreement (Fleiss' κ: first and second reading = 0.689 and 0.70, respectively). Intra-observer variability for the four and two classes of breast-composition assessment demonstrated substantial agreement (Cohen's κ, median = 0.758) and almost perfect agreement (Cohen's κ, median = 0.813). Assessments of consensus between the 10 physicians and the automated software Volpara® revealed slight agreement (Cohen's κ; first and second reading: 0.104 and 0.075, respectively). CONCLUSIONS: Qualitative visual assessment of mammographic breast composition using the new Japanese classification revealed excellent intra-observer reproducibility. However, persistent inter-observer variability, presenting a challenge in establishing it as the gold standard in Japan.