RESUMO
BACKGROUND: Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders. METHODS: Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed. FINDINGS: Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated. CONCLUSION: Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.
Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina não Supervisionado , Humanos , Criança , Pré-Escolar , Lactente , Adolescente , Análise por Conglomerados , Recém-Nascido , Masculino , Feminino , Fatores EtáriosRESUMO
Several publications have indicated potential benefit from collaboration with industry regarding wider use of anonymised routine NHS healthcare data. However, there is limited guidance regarding exactly how such collaborations between NHS hospitals and industry partners should best be carried out, and specific issues that need to be addressed at an individual project or collaboration level to achieve desired benefit. Specifically, routine health data are complex, not collected in a format optimised for secondary use, and often require interpretation based on clinical understanding of the medical conditions or patients. In order to address these issues, a formal partnership collaboration was established between an NHS organisation (Great Ormond Street Hospital for Children) and a pharmaceutical company (Roche Products Limited), to jointly understand the problems that require solving in order to maximise such use of NHS data to support improved patient outcomes and other patient/NHS benefit in a more sustainable way. We present the learnings from the first 2 years of the 5-year collaboration addressing aspects such as complexities of NHS Electronic Patient Record (EPR), data engineering and use of modern technology to optimise such data. Plus, the development of appropriate technology and data infrastructure within the NHS to support interoperability and prepare the NHS for wider application of artificial intelligence. We also highlight the staff skills and training needed to support such systems in the NHS, governance structures and processes needed to ensure appropriate use of tools and data and how best to co-design with patients, their families, and clinical teams. It is hoped that this review may provide useful information for both healthcare organisations and industry partners working towards the future of optimal use of data and technology for healthcare benefit.
RESUMO
OBJECTIVE: The COVID-19 pandemic and subsequent government restrictions have had a major impact on healthcare services and disease transmission, particularly those associated with acute respiratory infection. This study examined non-identifiable routine electronic patient record data from a specialist children's hospital in England, UK, examining the effect of pandemic mitigation measures on seasonal respiratory infection rates compared with forecasts based on open-source, transferable machine learning models. METHODS: We performed a retrospective longitudinal study of respiratory disorder diagnoses between January 2010 and February 2022. All diagnoses were extracted from routine healthcare activity data and diagnosis rates were calculated for several diagnosis groups. To study changes in diagnoses, seasonal forecast models were fit to prerestriction period data and extrapolated. RESULTS: Based on 144 704 diagnoses from 31 002 patients, all but two diagnosis groups saw a marked reduction in diagnosis rates during restrictions. We observed 91%, 89%, 72% and 63% reductions in peak diagnoses of 'respiratory syncytial virus', 'influenza', 'acute nasopharyngitis' and 'acute bronchiolitis', respectively. The machine learning predictive model calculated that total diagnoses were reduced by up to 73% (z-score: -26) versus expected during restrictions and increased by up to 27% (z-score: 8) postrestrictions. CONCLUSIONS: We demonstrate the association between COVID-19 related restrictions and significant reductions in paediatric seasonal respiratory infections. Moreover, while many infection rates have returned to expected levels postrestrictions, others remain supressed or followed atypical winter trends. This study further demonstrates the applicability and efficacy of routine electronic record data and cross-domain time-series forecasting to model, monitor, analyse and address clinically important issues.