RESUMO
OBJECTIVES: We aimed to synthesise evidence from prospective studies of digital breast tomosynthesis (DBT) screening to assess its effectiveness compared to digital mammography (DM). Specifically, we examined whether DBT reduces interval cancer rates (ICRs) in population breast cancer screening. MATERIALS AND METHODS: We performed a systematic review and meta-analysis of DBT screening studies (identified from January 2013 to March 2024). We included both RCTs and non-randomised prospective studies that used an independent comparison for our primary outcome ICRs. The risk of bias was assessed with QUADAS-2. We compared the ICR, cancer detection rate (CDR), and recall rate of DBT and DM screening using random effects meta-analysis models. Subgroup analyses estimated outcomes by study design. Sensitivity analyses estimated absolute effects from relative effects. RESULTS: Ten prospective studies (three RCTs, seven non-randomised) were eligible; all had a low risk of bias. There were 205,245 DBT-screened and 306,476 DM-screened participants with follow-up for interval cancer data. The pooled absolute ICR did not significantly differ between DBT and DM: -2.92 per 10,000 screens (95% CI: -6.39 to 0.54); however subsequent subgroup analysis indicated certain study designs may have biased this ICR estimate. Pooled ICR from studies that only sampled groups from the same time and region indicated DBT led to 5.50 less IC per 10,000 screens (95% CI: -9.47 to -1.54). Estimates from subgroup analysis that compared randomised and non-randomised trials did not significantly differ. CONCLUSION: This meta-analysis provides suggestive evidence that DBT decreases ICR relative to DM screening; further evidence is needed to reduce uncertainty regarding ICR differences between DBT and DM. KEY POINTS: Question Does DBT have long-term benefits over standard DM? Finding We find suggestive evidence in our primary analysis and stronger evidence in a follow-up analysis that DBT reduces interval cancers. Clinical relevance This meta-analysis provides the first indication that DBT may detect additional cancers that are clinically meaningful, based on suggestive evidence of a reduction in ICR. This finding does not preclude the simultaneous possibility of overdiagnosis.
RESUMO
BACKGROUND: There is limited evidence on the performance of digital breast tomosynthesis (DBT) in populations at increased risk of breast cancer. Our objective was to systematically review evidence on the performance of DBT versus digital mammography (DM) in women with a family history of breast cancer (FHBC). METHODS: We searched 5 databases (2011-January 2024) for studies comparing DBT and DM in women with a FHBC that reported any measure of cancer detection, recall, sensitivity and specificity. Findings were presented using a descriptive and narrative approach. Risk of bias was assessed using QUADAS-2/C. RESULTS: Five (4 screening, 1 diagnostic) studies were included (total 3089 DBT, 3024 DM) with most (4/5) being prospective including 1 RCT. All studies were assessed as being at high risk of bias or applicability concern. Four screening studies reported recall rate (range: DBT: 2.7%-4.5%, DM: 2.8%-11.5%) with 3 reporting DBT had lower rates than DM. Cancer detection rates (CDR) were reported in the same studies (DBT: 5.1-11.6 per 1000, DM: 3.8-8.3); 3 reported higher CDR for DBT (vs. DM), and 1 reported same CDR for both. Compared with DM, higher values for sensitivity, specificity and PPV for DBT were reported in 2 studies. CONCLUSION: This review provides early evidence that DBT may outperform DM for screening women with a FHBC. Our findings support further evaluation of DBT in this population. However, summarized findings were based on few studies and participants, and high-quality studies with improved methodology are needed to address biases identified in our review.
RESUMO
BACKGROUND: Biopsy-proven breast lesions such as atypical ductal hyperplasia (ADH) or atypical lobular hyperplasia (ALH), lobular carcinoma in situ (LCIS) and flat epithelial atypia (FEA) increase subsequent risk of breast cancer (BC), but long-term risk has not been synthesized. A systematic review was conducted to quantify future risk of breast cancer accounting for time since diagnosis of these high-risk lesions. METHODS: A systematic search of literature from 2000 was performed to identify studies reporting BC as an outcome following core-needle or excision biopsy histology diagnosis of ADH, ALH, LCIS, lobular neoplasia (LN) or FEA. Meta-analyses were conducted to estimate cumulative BC incidence at five-yearly intervals following initial diagnosis for each histology type. RESULTS: Seventy studies reporting on 47,671 subjects met eligibility criteria. BC incidence at five years post-diagnosis with a high-risk lesion was estimated to be 9.3 % (95 % CI 6.9-12.5 %) for LCIS, 6.6 % (95 % CI 4.4-9.7 %) for ADH, 9.7 % (95 % CI 5.3-17.2 %) for ALH, 8.6 % (95 % CI 6.5-11.4 %) for LN, and 3.8 % (95 % CI 1.2-11.7 %) for FEA. At ten years post-diagnosis, BC incidence was estimated to be 11.8 % (95 % CI 9.0-15.3 %) for LCIS, 13.9 % (95 % CI 7.8-23.6 %) for ADH, 15.4 % (95 % CI 7.2-29.3 %) for ALH, 17.0 % (95 % CI 7.2-35.3 %) for LN and 7.2 % (95 % CI 2.2-21.2 %) for FEA. CONCLUSION: Our findings demonstrate increased BC risk sustained over time since initial diagnosis of high-risk breast lesions, varying by lesion type, with relatively less evidence for FEA.
RESUMO
As breast screening services move towards use of healthcare AI (HCAI) for screen reading, research on public views of HCAI can inform more person-centered implementation. We synthesise reviews of public views of HCAI in general, and review primary studies of women's views of AI in breast screening. People generally appear open to HCAI and its potential benefits, despite a wide range of concerns; similarly, women are open towards AI in breast screening because of the potential benefits, but are concerned about a wide range of risks. Women want radiologists to remain central; oversight, evaluation and performance, care, equity and bias, transparency, and accountability are key issues; women may be less tolerant of AI error than of human error. Using our recent Australian primary study, we illustrate both the value of informing participants before collecting data, and women's views. The 40 screening-age women in this study stipulated four main conditions on breast screening AI implementation: 1) maintaining human control; 2) strong evidence of performance; 3) supporting familiarisation with AI; and 4) providing adequate reasons for introducing AI. Three solutions were offered to support familiarisation: transparency and information; slow and staged implementation; and allowing women to opt-out of AI reading. We provide recommendations to guide both implementation of AI in healthcare and research on public views of HCAI. Breast screening services should be transparent about AI use and share information about breast screening AI with women. Implementation should be slow and staged, providing opt-out options if possible. Screening services should demonstrate strong governance to maintain clinician control, demonstrate excellent AI system performance, assure data protection and bias mitigation, and give good reasons to justify implementation. When these measures are put in place, women are more likely to see HCAI use in breast screening as legitimate and acceptable.
Assuntos
Inteligência Artificial , Neoplasias da Mama , Detecção Precoce de Câncer , Pesquisa Qualitativa , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/prevenção & controle , Detecção Precoce de Câncer/métodos , Austrália , Adulto , Pessoa de Meia-Idade , Mamografia/métodos , Programas de Rastreamento/métodosRESUMO
Artificial intelligence (AI) algorithms have been retrospectively evaluated as replacement for one radiologist in screening mammography double-reading; however, methods for resolving discordance between radiologists and AI in the absence of 'real-world' arbitration may underestimate cancer detection rate (CDR) and recall. In 108,970 consecutive screens from a population screening program (BreastScreen WA, Western Australia), 20,120 were radiologist/AI discordant without real-world arbitration. Recall probabilities were randomly assigned for these screens in 1000 simulations. Recall thresholds for screen-detected and interval cancers (sensitivity) and no cancer (false-positive proportion, FPP) were varied to calculate mean CDR and recall rate for the entire cohort. Assuming 100% sensitivity, the maximum CDR was 7.30 per 1000 screens. To achieve >95% probability that the mean CDR exceeded the screening program CDR (6.97 per 1000), interval cancer sensitivities ≥63% (at 100% screen-detected sensitivity) and ≥91% (at 80% screen-detected sensitivity) were required. Mean recall rate was relatively constant across sensitivity assumptions, but varied by FPP. FPP > 6.5% resulted in recall rates that exceeded the program estimate (3.38%). CDR improvements depend on a majority of interval cancers being detected in radiologist/AI discordant screens. Such improvements are likely to increase recall, requiring careful monitoring where AI is deployed for screen-reading.
RESUMO
This study aimed to estimate participation in private breast screening in Queensland, Australia, where public-funded screening is implemented, and to identify factors associated with the screening setting, using an online survey (999 female respondents aged 40-74). Screening-specific and socio-demographic factors were collected. Multivariable logistic regression was used to identify factors associated with screening setting (public vs private) and screening recency (<2 vs ≥2 years). Participation estimates were 53.2% (95% confidence interval, CI: 50.0%-56.3%) and 10.9% (9.0%-13.0%) for national screening program and private screening, respectively. In the screening setting model, participation in private screening was significantly associated with longer time since last screening (>4 versus <2 years, odds ratio (OR) = 7.3, 95%CI: 4.1-12.9, p < 0.001), having symptoms (OR = 9.5, 5.8-15.5, p < 0.001), younger age (40-49 versus 50-74 years, OR = 1.8, 1.1-3.0, p = 0.018) and having children <18 years in household (OR = 2.4, 1.5-3.9, p < 0.001). In the screening recency model, only screening setting was statistically significant and private screening was associated with screening recency ≥2 years (OR = 4.0, 2.8-5.7, p < 0.001). Around one in nine women screen outside of the BreastScreen Queensland program. Clinical and socio-demographic factors associated with participation in private screening were identified, providing knowledge relevant to the program's endeavours to improve screening participation.
Assuntos
Neoplasias da Mama , Detecção Precoce de Câncer , Humanos , Feminino , Pessoa de Meia-Idade , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/epidemiologia , Estudos Transversais , Adulto , Queensland/epidemiologia , Idoso , Estudos Retrospectivos , Detecção Precoce de Câncer/estatística & dados numéricos , Mamografia/estatística & dados numéricos , Setor Privado/estatística & dados numéricos , Programas de Rastreamento/estatística & dados numéricosRESUMO
PURPOSE: Artificial intelligence (AI) for reading breast screening mammograms could potentially replace (some) human-reading and improve screening effectiveness. This systematic review aims to identify and quantify the types of AI errors to better understand the consequences of implementing this technology. METHODS: Electronic databases were searched for external validation studies of the accuracy of AI algorithms in real-world screening mammograms. Descriptive synthesis was performed on error types and frequency. False negative proportions (FNP) and false positive proportions (FPP) were pooled within AI positivity thresholds using random-effects meta-analysis. RESULTS: Seven retrospective studies (447,676 examinations; published 2019-2022) met inclusion criteria. Five studies reported AI error as false negatives or false positives. Pooled FPP decreased incrementally with increasing positivity threshold (71.83% [95% CI 69.67, 73.90] at Transpara 3 to 10.77% [95% CI 8.34, 13.79] at Transpara 9). Pooled FNP increased incrementally from 0.02% [95% CI 0.01, 0.03] (Transpara 3) to 0.12% [95% CI 0.06, 0.26] (Transpara 9), consistent with a trade-off with FPP. Heterogeneity within thresholds reflected algorithm version and completeness of the reference standard. Other forms of AI error were reported rarely (location error and technical error in one study each). CONCLUSION: AI errors are largely interpreted in the framework of test accuracy. FP and FN errors show expected variability not only by positivity threshold, but also by algorithm version and study quality. Reporting of other forms of AI errors is sparse, despite their potential implications for adoption of the technology. Considering broader types of AI error would add nuance to reporting that can inform inferences about AI's utility.
Assuntos
Inteligência Artificial , Neoplasias da Mama , Mamografia , Humanos , Mamografia/métodos , Mamografia/normas , Feminino , Neoplasias da Mama/diagnóstico por imagem , Detecção Precoce de Câncer/métodos , Algoritmos , Reações Falso-Positivas , Erros de Diagnóstico , Reações Falso-NegativasRESUMO
BACKGROUND: Digital breast tomosynthesis (DBT) for breast cancer screening has been shown in international trials to increase cancer detection compared with mammography; however, results have varied across screening settings, and currently there is limited and conflicting evidence on interval cancer rates (a surrogate for screening effectiveness). Australian pilot data also indicated substantially longer screen-reading time for DBT posing a barrier for adoption. There is a critical need for evidence on DBT to inform its role in Australia, including evaluation of potentially more feasible models of implementation, and quantification of screening outcomes by breast density which has global relevance. METHODS: This study is a prospective trial embedded in population-based Australian screening services (Maroondah BreastScreen, Eastern Health, Victoria) comparing hybrid screening comprising DBT (mediolateral oblique view) and digital mammography (cranio-caudal view) with standard mammography screening in a concurrent group attending another screening site. All eligible women aged ≥40 years attending the Maroondah service for routine screening will be enrolled (unless they do not provide verbal consent and opt-out of hybrid screening; are unable to provide consent; or where a 'pushback' image on hybrid DBT cannot be obtained). Each arm will enrol 20,000 women. The primary outcomes are cancer detection rate (per 1000 screens) and recall rate (percentage). Secondary outcomes include 'opt-out' rate; cohort characteristics; cancer characteristics; assessment outcomes; screen-reading time; and interval cancer rate at 24-month follow-up. Automated volumetric breast density will be measured to allow stratification of outcomes by mammographic density. Stratification by age and screening round will also be undertaken. An interim analysis will be undertaken after the first 5000 screens in the intervention group. DISCUSSION: This is the first Australian prospective trial comparing hybrid DBT/mammography with standard mammography screening that is powered to show differences in cancer detection. Findings will inform future implementation of DBT in screening programs world-wide and provide evidence on whether DBT should be adopted in the broader BreastScreen program in Australia or in subgroups of screening participants. TRIAL REGISTRATION: The trial is registered with the Australian New Zealand Clinical Trials Registry (ANZCTR, ACTRN12623001144606, https://www.anzctr.org.au/). Registration will be updated to reflect trial progress and protocol amendments.
Assuntos
Neoplasias da Mama , Feminino , Humanos , Austrália , Mama/diagnóstico por imagem , Neoplasias da Mama/diagnóstico por imagem , Detecção Precoce de Câncer/métodos , Mamografia/métodos , Programas de Rastreamento/métodos , Estudos Prospectivos , Ensaios Clínicos Controlados não Aleatórios como AssuntoRESUMO
PURPOSE: To summarize the literature regarding the performance of mammography-image based artificial intelligence (AI) algorithms, with and without additional clinical data, for future breast cancer risk prediction. MATERIALS AND METHODS: A systematic literature review was performed using six databases (medRixiv, bioRxiv, Embase, Engineer Village, IEEE Xplore, and PubMed) from 2012 through September 30, 2022. Studies were included if they used real-world screening mammography examinations to validate AI algorithms for future risk prediction based on images alone or in combination with clinical risk factors. The quality of studies was assessed, and predictive accuracy was recorded as the area under the receiver operating characteristic curve (AUC). RESULTS: Sixteen studies met inclusion and exclusion criteria, of which 14 studies provided AUC values. The median AUC performance of AI image-only models was 0.72 (range 0.62-0.90) compared with 0.61 for breast density or clinical risk factor-based tools (range 0.54-0.69). Of the seven studies that compared AI image-only performance directly to combined image + clinical risk factor performance, six demonstrated no significant improvement, and one study demonstrated increased improvement. CONCLUSIONS: Early efforts for predicting future breast cancer risk based on mammography images alone demonstrate comparable or better accuracy to traditional risk tools with little or no improvement when adding clinical risk factor data. Transitioning from clinical risk factor-based to AI image-based risk models may lead to more accurate, personalized risk-based screening approaches.
Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Mamografia/métodos , Inteligência Artificial , Detecção Precoce de Câncer/métodos , Mama/diagnóstico por imagem , Estudos RetrospectivosRESUMO
OBJECTIVE: This follow-up study of BreastScreen Victoria's pilot trial of digital breast tomosynthesis aimed to report interval cancer rates, screening sensitivity, and density-stratified outcomes for tomosynthesis vs mammography screening. METHODS: Prospective pilot trial [ACTRN-12617000947303] in Maroondah BreastScreen recruited females ≥ 40 years presenting for screening (August 2017-November 2018) to DBT; concurrent screening participants who received mammography formed a comparison group. Follow-up of 24 months from screen date was used to ascertain interval cancers; automated breast density was measured. RESULTS: There were 48 screen-detected and 9 interval cancers amongst 4908 tomosynthesis screens, and 34 screen-detected and 16 interval cancers amongst 5153 mammography screens. Interval cancer rate was 1.8/1000 (95%CI 0.8-3.5) for tomosynthesis vs 3.1/1000 (95%CI 1.8-5.0) for mammography (p = 0.20). Sensitivity of tomosynthesis (86.0%; 95% CI 74.2-93.7) was significantly higher than mammography (68.0%; 95% CI 53.3-80.5), p = 0.03. Cancer detection rate (CDR) of 9.8/1000 (95%CI 7.2-12.9) for tomosynthesis was higher than that of 6.6/1000 (95%CI 4.6-9.2) for mammography (p = 0.08); density-stratified analyses showed CDR was significantly higher for tomosynthesis than mammography (10.6/1000 vs 3.5/1000, p = 0.03) in high-density screens. Recall rate for tomosynthesis was significantly higher than for mammography (4.2% vs 3.0%, p < 0.001), and this increase in recall for tomosynthesis was evident only in high-density screens (5.6% vs 2.9%, p < 0.001). CONCLUSION: Although interval cancer rates did not significantly differ between screened groups, sensitivity was significantly higher for tomosynthesis than mammography screening. ADVANCES IN KNOWLEDGE: In a program-embedded pilot trial, both increased cancer detection and recall rates from tomosynthesis were predominantly observed in high-density screens.
Assuntos
Neoplasias da Mama , Neoplasias , Feminino , Humanos , Densidade da Mama , Neoplasias da Mama/diagnóstico por imagem , Detecção Precoce de Câncer , Seguimentos , Mamografia , Programas de Rastreamento , Projetos Piloto , Estudos Prospectivos , Adulto , Pessoa de Meia-IdadeRESUMO
BACKGROUND: Artificial intelligence (AI) has been proposed to reduce false-positive screens, increase cancer detection rates (CDRs), and address resourcing challenges faced by breast screening programs. We compared the accuracy of AI versus radiologists in real-world population breast cancer screening, and estimated potential impacts on CDR, recall and workload for simulated AI-radiologist reading. METHODS: External validation of a commercially-available AI algorithm in a retrospective cohort of 108,970 consecutive mammograms from a population-based screening program, with ascertained outcomes (including interval cancers by registry linkage). Area under the ROC curve (AUC), sensitivity and specificity for AI were compared with radiologists who interpreted the screens in practice. CDR and recall were estimated for simulated AI-radiologist reading (with arbitration) and compared with program metrics. FINDINGS: The AUC for AI was 0.83 compared with 0.93 for radiologists. At a prospective threshold, sensitivity for AI (0.67; 95% CI: 0.64-0.70) was comparable to radiologists (0.68; 95% CI: 0.66-0.71) with lower specificity (0.81 [95% CI: 0.81-0.81] versus 0.97 [95% CI: 0.97-0.97]). Recall rate for AI-radiologist reading (3.14%) was significantly lower than for the BSWA program (3.38%) (-0.25%; 95% CI: -0.31 to -0.18; P < 0.001). CDR was also lower (6.37 versus 6.97 per 1000) (-0.61; 95% CI: -0.77 to -0.44; P < 0.001); however, AI detected interval cancers that were not found by radiologists (0.72 per 1000; 95% CI: 0.57-0.90). AI-radiologist reading increased arbitration but decreased overall screen-reading volume by 41.4% (95% CI: 41.2-41.6). INTERPRETATION: Replacement of one radiologist by AI (with arbitration) resulted in lower recall and overall screen-reading volume. There was a small reduction in CDR for AI-radiologist reading. AI detected interval cases that were not identified by radiologists, suggesting potentially higher CDR if radiologists were unblinded to AI findings. These results indicate AI's potential role as a screen-reader of mammograms, but prospective trials are required to determine whether CDR could improve if AI detection was actioned in double-reading with arbitration. FUNDING: National Breast Cancer Foundation (NBCF), National Health and Medical Research Council (NHMRC).
Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/epidemiologia , Inteligência Artificial , Estudos Retrospectivos , Estudos Prospectivos , Estudos de Coortes , Programas de Rastreamento/métodos , Detecção Precoce de Câncer/métodos , Mamografia/métodosRESUMO
BACKGROUND: A 2014 SSO-ASTRO guideline on surgical margins aimed to reduce unnecessary reoperation after breast conserving surgery (BCS). We investigate whether publication of the guideline was associated with a reduction in reoperation in Western Australia (WA). METHODS: In this retrospective, population-based cohort study, cases of newly-diagnosed breast cancer were identified from the WA Cancer Registry. Linkage to the Hospital Morbidity Data Collection identified index BCS for invasive cancer between January 2009 and June 2018 (N = 8059) and reoperation within 90 days. Pre-guideline (2009-2013) and post-guideline (2014-2018) reoperation proportions were compared, and temporal trends were estimated with generalised linear regression. RESULTS: The pre-guideline reoperation proportion was 25.8% compared with 21.7% post-guideline (difference -4.0% [95% CI -5.9, -2.2, p < 0.001], odds ratio [OR] 0.80 [95% CI 0.72, 0.89, p < 0.001]). Absolute reductions were similar for repeat BCS (16.3% versus 14.6%; difference -1.8% [95% CI -3.4, -0.2, p = 0.03]) and conversion to mastectomy (9.4% versus 7.2%; difference -2.2% [95% CI -3.4, -1.0, p < 0.001]). Over the study period, there was an annual absolute change in reoperation of -0.8% (95% CI -1.2, -0.5, p < 0.001). Accounting for this linear trend, the difference in reoperation between time periods was -0.5% (95% CI -4.3, 3.3; p = 0.81), reflecting a non-significant reduction in conversion to mastectomy. CONCLUSIONS: Comparisons of pre- versus post-guideline time periods in WA showed reductions in reoperation that were similar to international estimates; however, an annual decline in reoperation predated the guideline. Analyses that do not account for temporal trends are likely to overestimate changes in reoperation associated with the guideline.
Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/cirurgia , Mastectomia Segmentar , Mastectomia , Reoperação , Estudos Retrospectivos , Estudos de Coortes , Austrália Ocidental , Margens de ExcisãoRESUMO
BACKGROUND: Breast cancer care has been affected by the COVID-19 pandemic. This systematic review aims to describe the observed pandemic-related changes in clinical and health services outcomes for breast screening and diagnosis. METHODS: Seven databases (January 2020-March 2021) were searched to identify studies of breast cancer screening or diagnosis that reported observed outcomes before and related to the pandemic. Findings were presented using a descriptive and narrative approach. RESULTS: Seventy-four studies were included in this systematic review; all compared periods before and after (or fluctuations during) the pandemic. None were assessed as being at low risk of bias. A reduction in screening volumes during the pandemic was found with over half of studies reporting reductions of ≥49%. A majority (66%) of studies reported reductions of ≥25% in the number of breast cancer diagnoses, and there was a higher proportion of symptomatic than screen-detected cancers. The distribution of cancer stage at diagnosis during the pandemic showed lower proportions of early-stage (stage 0-1/I-II, or Tis and T1) and higher proportions of relatively more advanced cases than that in the pre-pandemic period, however population rates were generally not reported. CONCLUSIONS: Evidence of substantial reductions in screening volume and number of diagnosed breast cancers, and higher proportions of advanced stage cancer at diagnosis were found during the pandemic. However, these findings reflect short term outcomes, and higher-quality research examining the long-term impact of the pandemic is needed.
Assuntos
Neoplasias da Mama , COVID-19 , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/epidemiologia , COVID-19/diagnóstico , COVID-19/epidemiologia , Pandemias , Detecção Precoce de Câncer , Estadiamento de Neoplasias , Teste para COVID-19RESUMO
BACKGROUND: We examined whether digital breast tomosynthesis (DBT) detects differentially in high- or low-density screens. METHODS: We searched six databases (2009-2020) for studies comparing DBT and digital mammography (DM), and reporting cancer detection rate (CDR) and/or recall rate by breast density. Meta-analysis was performed to pool incremental CDR and recall rate for DBT (versus DM) for high- and low-density (dichotomised based on BI-RADS) and within-study differences in incremental estimates between high- and low-density. Screening settings (European/US) were compared. RESULTS: Pooled within-study difference in incremental CDR for high- versus low-density was 1.0/1000 screens (95% CI: 0.3, 1.6; p = 0.003). Estimates were not significantly different in US (0.6/1000; 95% CI: 0.0, 1.3; p = 0.05) and European (1.9/1000; 95% CI: 0.3, 3.5; p = 0.02) settings (p for subgroup difference = 0.15). For incremental recall rate, within-study differences between density subgroups differed by setting (p < 0.001). Pooled incremental recall was less in high- versus low-density screens (-0.9%; 95% CI: -1.4%, -0.4%; p < 0.001) in US screening, and greater (0.8%; 95% CI: 0.3%, 1.3%; p = 0.001) in European screening. CONCLUSIONS: DBT has differential incremental cancer detection and recall by breast density. Although incremental CDR is greater in high-density, a substantial proportion of additional cancers is likely to be detected in low-density screens. Our findings may assist screening programmes considering DBT for density-tailored screening.
Assuntos
Densidade da Mama , Neoplasias da Mama , Neoplasias da Mama/diagnóstico por imagem , Detecção Precoce de Câncer , Feminino , Humanos , Mamografia , Programas de Rastreamento , PesquisaRESUMO
PURPOSE: The aim of this study was to describe the current state of science regarding independent external validation of artificial intelligence (AI) technologies for screening mammography. METHODS: A systematic review was performed across five databases (Embase, PubMed, IEEE Explore, Engineer Village, and arXiv) through December 10, 2020. Studies that used screening examinations from real-world settings to externally validate AI algorithms for mammographic cancer detection were included. The main outcome was diagnostic accuracy, defined by area under the receiver operating characteristic curve (AUC). Performance was also compared between radiologists and either stand-alone AI or combined radiologist and AI interpretation. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. RESULTS: After data extraction, 13 studies met the inclusion criteria (148,361 total patients). Most studies (77% [n = 10]) evaluated commercially available AI algorithms. Studies included retrospective reader studies (46% [n = 6]), retrospective simulation studies (38% [n = 5]), or both (15% [n = 2]). Across 5 studies comparing stand-alone AI with radiologists, 60% (n = 3) demonstrated improved accuracy with AI (AUC improvement range, 0.02-0.13). All 5 studies comparing combined radiologist and AI interpretation with radiologists alone demonstrated improved accuracy with AI (AUC improvement range, 0.028-0.115). Most studies had risk for bias or applicability concerns for patient selection (69% [n = 9]) and the reference standard (69% [n = 9]). Only two studies obtained ground-truth cancer outcomes through regional cancer registry linkage. CONCLUSIONS: To date, external validation efforts for AI screening mammographic technologies suggest small potential diagnostic accuracy improvements but have been retrospective in nature and suffer from risk for bias and applicability concerns.
Assuntos
Inteligência Artificial , Neoplasias da Mama , Algoritmos , Neoplasias da Mama/diagnóstico por imagem , Detecção Precoce de Câncer , Feminino , Humanos , Mamografia , Estudos RetrospectivosRESUMO
INTRODUCTION: Artiï¬cial intelligence (AI) algorithms for interpreting mammograms have the potential to improve the effectiveness of population breast cancer screening programmes if they can detect cancers, including interval cancers, without contributing substantially to overdiagnosis. Studies suggesting that AI has comparable or greater accuracy than radiologists commonly employ 'enriched' datasets in which cancer prevalence is higher than in population screening. Routine screening outcome metrics (cancer detection and recall rates) cannot be estimated from these datasets, and accuracy estimates may be subject to spectrum bias which limits generalisabilty to real-world screening. We aim to address these limitations by comparing the accuracy of AI and radiologists in a cohort of consecutive of women attending a real-world population breast cancer screening programme. METHODS AND ANALYSIS: A retrospective, consecutive cohort of digital mammography screens from 109 000 distinct women was assembled from BreastScreen WA (BSWA), Western Australia's biennial population screening programme, from November 2016 to December 2017. The cohort includes 761 screen-detected and 235 interval cancers. Descriptive characteristics and results of radiologist double-reading will be extracted from BSWA outcomes data collection. Mammograms will be reinterpreted by a commercial AI algorithm (DeepHealth). AI accuracy will be compared with that of radiologist single-reading based on the diï¬erence in the area under the receiver operating characteristic curve. Cancer detection and recall rates for combined AI-radiologist reading will be estimated by pairing the first radiologist read per screen with the AI algorithm, and compared with estimates for radiologist double-reading. ETHICS AND DISSEMINATION: This study has ethical approval from the Women and Newborn Health Service Ethics Committee (EC00350) and the Curtin University Human Research Ethics Committee (HRE2020-0316). Findings will be published in peer-reviewed journals and presented at national and international conferences. Results will also be disseminated to stakeholders in Australian breast cancer screening programmes and policy makers in population screening.
Assuntos
Neoplasias da Mama , Detecção Precoce de Câncer , Inteligência Artificial , Austrália , Neoplasias da Mama/diagnóstico por imagem , Estudos de Coortes , Detecção Precoce de Câncer/métodos , Feminino , Humanos , Recém-Nascido , Mamografia/métodos , Programas de Rastreamento , Estudos RetrospectivosRESUMO
Importance: The 2014 publication of the Society of Surgical Oncology-American Society for Radiation Oncology (SSO-ASTRO) Consensus Guideline on Margins for Breast-Conserving Surgery recommended a negative margin definition of no ink on tumor. Adoption of this guideline would represent a major change in surgical practice that could lower the rates of reoperation. Objective: To assess changes in reoperation rates after publication of the SSO-ASTRO guideline. Data Sources: A systematic search of Embase, PREMEDLINE, Evidence-Based Medicine Reviews, Scopus, and Web of Science for biomedical literature published from January 2014 to July 2019 was performed. This search was supplemented by web searches and manual searching of conference abstracts. Study Selection: Included studies compared the reoperation rates in preguideline vs postguideline cohorts (actual change), retrospectively applied the SSO-ASTRO guideline to a preguideline cohort (projected change), or described the economic outcomes of the guideline. Data Extraction and Synthesis: Study characteristics and reoperation rates were extracted independently by 2 reviewers. Odds ratios (ORs) were pooled by random effects meta-analysis. Analyses were stratified by study setting (institutional or population) and preguideline accepted margins. The economic outcomes of the guideline were summarized narratively. The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) reporting guideline was followed. Main Outcomes and Measures: Odds ratios for postguideline vs preguideline reoperation rates. Results: From 1114 citations, 30 studies (with 599â¯016 participants) reported changes in reoperation rates. Studies included a median (range) of 487 (100-521 578) participants, and 20 studies were undertaken in the US, 6 in the UK, 3 in Canada, and 1 in Australia. Among 21 studies of actual changes, pooled ORs showed a statistically significant reduction in reoperation, with an OR lower in institution-based studies than in population-based studies (OR, 0.62 [95% CI, 0.52-0.74] vs 0.76 [95% CI, 0.72-0.80]; P = .04 for subgroup differences). Among 9 studies of projected changes, the pooled OR was lower for preguideline margin thresholds of 2 mm or more compared with 1 mm (OR, 0.47 [95% CI, 0.40-0.56] vs 0.85 [95% CI, 0.79-0.91; P < .001 for subgroup differences). Projected changes were likely to overestimate actual changes. Six studies that estimated the postguideline economic outcome found the guideline to be potentially cost saving, with a median (range) saving of US $3540 ($1800-$25â¯650) per woman avoiding reoperation. Conclusions and Relevance: This study found a decrease in reoperation rates after the publication of the SSO-ASTRO guideline; this reduction was greater at an institutional level than a population level, the latter reflecting the differences in guideline adoption between centers. These early outcomes may be conservative estimates of longer-term implications.