ABSTRACT
For over a decade, Scotland has implemented and operationalized a system of Safe Havens, which provides secure analytics platforms for researchers to access linked, deidentified electronic health records (EHRs) while managing the risk of unauthorized reidentification. In this paper, a perspective is provided on the state-of-the-art Scottish Safe Haven network, including its evolution, to define the key activities required to scale the Scottish Safe Haven network's capability to facilitate research and health care improvement initiatives. A set of processes related to EHR data and their delivery in Scotland have been discussed. An interview with each Safe Haven was conducted to understand their services in detail, as well as their commonalities. The results show how Safe Havens in Scotland have protected privacy while facilitating the reuse of the EHR data. This study provides a common definition of a Safe Haven and promotes a consistent understanding among the Scottish Safe Haven network and the clinical and academic research community. We conclude by identifying areas where efficiencies across the network can be made to meet the needs of population-level studies at scale.
Subject(s)
Electronic Health Records , Privacy , Humans , ScotlandABSTRACT
Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance since commercial and government applications of ML can draw on multiple sources of data, potentially including users' and clients' sensitive data. We provide a comprehensive survey of contemporary advances on several fronts, covering involuntary data leakage which is natural to ML models, potential malicious leakage which is caused by privacy attacks, and currently available defence mechanisms. We focus on inference-time leakage, as the most likely scenario for publicly available models. We first discuss what leakage is in the context of different data, tasks, and model architectures. We then propose a taxonomy across involuntary and malicious leakage, followed by description of currently available defences, assessment metrics, and applications. We conclude with outstanding challenges and open questions, outlining some promising directions for future research.
ABSTRACT
Metabolic dysfunction-associated steatotic liver disease (MASLD) is the commonest cause of chronic liver disease worldwide and represents an unmet precision medicine challenge. We established a retrospective national cohort of 940 histologically defined patients (55.4% men, 44.6% women; median body mass index 31.3; 32% with type 2 diabetes) covering the complete MASLD severity spectrum, and created a secure, searchable, open resource (SteatoSITE). In 668 cases and 39 controls, we generated hepatic bulk RNA sequencing data and performed differential gene expression and pathway analysis, including exploration of gender-specific differences. A web-based gene browser was also developed. We integrated histopathological assessments, transcriptomic data and 5.67 million days of time-stamped longitudinal electronic health record data to define disease-stage-specific gene expression signatures, pathogenic hepatic cell subpopulations and master regulator networks associated with adverse outcomes in MASLD. We constructed a 15-gene transcriptional risk score to predict future hepatic decompensation events (area under the receiver operating characteristic curve 0.86, 0.81 and 0.83 for 1-, 3- and 5-year risk, respectively). Additionally, thyroid hormone receptor beta regulon activity was identified as a critical suppressor of disease progression. SteatoSITE supports rational biomarker and drug development and facilitates precision medicine approaches for patients with MASLD.