Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study.

Syleouni, Maria-Eleni; Karavasiloglou, Nena; Manduchi, Laura; Wanner, Miriam; Korol, Dimitri; Ortelli, Laura; Bordoni, Andrea; Rohrmann, Sabine

Syleouni, Maria-Eleni; Karavasiloglou, Nena; Manduchi, Laura; Wanner, Miriam; Korol, Dimitri; Ortelli, Laura; Bordoni, Andrea; Rohrmann, Sabine.

Afiliação

Syleouni ME; Division of Chronic Disease Epidemiology, Epidemiology Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland.
Karavasiloglou N; Cancer Registry Zurich, Zug, Schaffhausen and Schwyz, University Hospital Zurich, Zurich, Switzerland.
Manduchi L; Division of Chronic Disease Epidemiology, Epidemiology Biostatistics and Prevention Institute, University of Zurich, Zurich, Switzerland.
Wanner M; European Food Safety Authority, Parma, Italy.
Korol D; Medical Data Science, ETH Zurich, Zurich, Switzerland.
Ortelli L; Cancer Registry Zurich, Zug, Schaffhausen and Schwyz, University Hospital Zurich, Zurich, Switzerland.
Bordoni A; Cancer Registry Zurich, Zug, Schaffhausen and Schwyz, University Hospital Zurich, Zurich, Switzerland.
Rohrmann S; Ticino Cancer Registry, Public Health Division of Canton Ticino, Locarno, Switzerland.

Int J Cancer ; 153(5): 932-941, 2023 Sep 01.

Article em En | MEDLINE | ID: mdl-37243372

ABSTRACT

ABSTRACT

Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.

Assuntos

Neoplasias da Mama; Humanos; Feminino; Neoplasias da Mama/diagnóstico; Neoplasias da Mama/epidemiologia; Algoritmos; Redes Neurais de Computação; Mama; Aprendizado de Máquina

Palavras-chave

breast cancer; cancer registry; machine learning; prediction; second cancer

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Neoplasias da Mama Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Neoplasias da Mama Idioma: En Ano de publicação: 2023 Tipo de documento: Article