RESUMO
BACKGROUND: More than 4500 women are diagnosed with breast cancer each year in Denmark, however, despite adequate treatment 10-30% of patients will experience a recurrence. The Danish Breast Cancer Group (DBCG) stores information on breast cancer recurrence but to improve data completeness automated identification of patients with recurrence is needed. METHODS: We included patient data from the DBCG, the National Pathology Database, and the National Patient Registry for patients with an invasive breast cancer diagnosis after 1999. In total, relevant features of 79,483 patients with a definitive surgery were extracted. A machine learning (ML) model was trained, using a simplistic encoding scheme of features, on a development sample covering 5333 patients with known recurrence and three times as many non-recurrent women. The model was validated in a validation sample consisting of 1006 patients with unknown recurrence status. RESULTS: The ML model identified patients with recurrence with AUC-ROC of 0.93 (95% CI: 0.93-0.94) in the development, and an AUC-ROC of 0.86 (95% CI: 0.83-0.88) in the validation sample. CONCLUSION: An off-the-shelf ML model, trained using the simplistic encoding scheme, could identify recurrence patients across multiple national registries. This approach might potentially enable researchers and clinicians to better and faster identify patients with recurrence and reduce manual patient data interpretation.
Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/epidemiologia , Neoplasias da Mama/terapia , Sistema de Registros , Dinamarca/epidemiologia , Recidiva Local de Neoplasia/diagnóstico , Recidiva Local de Neoplasia/epidemiologiaRESUMO
Patients with severe COVID-19 have overwhelmed healthcare systems worldwide. We hypothesized that machine learning (ML) models could be used to predict risks at different stages of management and thereby provide insights into drivers and prognostic markers of disease progression and death. From a cohort of approx. 2.6 million citizens in Denmark, SARS-CoV-2 PCR tests were performed on subjects suspected for COVID-19 disease; 3944 cases had at least one positive test and were subjected to further analysis. SARS-CoV-2 positive cases from the United Kingdom Biobank was used for external validation. The ML models predicted the risk of death (Receiver Operation Characteristics-Area Under the Curve, ROC-AUC) of 0.906 at diagnosis, 0.818, at hospital admission and 0.721 at Intensive Care Unit (ICU) admission. Similar metrics were achieved for predicted risks of hospital and ICU admission and use of mechanical ventilation. Common risk factors, included age, body mass index and hypertension, although the top risk features shifted towards markers of shock and organ dysfunction in ICU patients. The external validation indicated fair predictive performance for mortality prediction, but suboptimal performance for predicting ICU admission. ML may be used to identify drivers of progression to more severe disease and for prognostication patients in patients with COVID-19. We provide access to an online risk calculator based on these findings.