RESUMEN
Precision medicine relies on molecular and systems biology methods as well as bidirectional association studies of phenotypes and (high-throughput) genomic data. However, the integrated use of such data often faces obstacles, especially in regards to data protection. An important prerequisite for research data processing is usually informed consent. But collecting consent is not always feasible, in particular when data are to be analyzed retrospectively. For phenotype data, anonymization, i.e. the altering of data in such a way that individuals cannot be identified, can provide an alternative. Several re-identification attacks have shown that this is a complex task and that simply removing directly identifying attributes such as names is usually not enough. More formal approaches are needed that use mathematical models to quantify risks and guide their reduction. Due to the complexity of these techniques, it is challenging and not advisable to implement them from scratch. Open software libraries and tools can provide a robust alternative. However, also the range of available anonymization tools is heterogeneous and obtaining an overview of their strengths and weaknesses is difficult due to the complexity of the problem space. We therefore performed a systematic review of open anonymization tools for structured phenotype data described in the literature between 1990 and 2021. Through a two-step eligibility assessment process, we selected 13 tools for an in-depth analysis. By comparing the supported anonymization techniques and further aspects, such as maturity, we derive recommendations for tools to use for anonymizing phenotype datasets with different properties.
Asunto(s)
Investigación Biomédica , Privacidad , Estudios Retrospectivos , Anonimización de la Información , FenotipoRESUMEN
PURPOSE: Medical reports, governed by HIPAA regulations, contain personal health information (PHI), restricting secondary data use. Utilizing natural language processing (NLP) and large language models (LLM), we sought to employ publicly available methods to automatically anonymize PHI in free-text radiology reports. MATERIALS AND METHODS: We compared two publicly available rule-based NLP models (spaCy; NLPac, accuracy-optimized; NLPsp, speed-optimized; iteratively improved on 400 free-text CT-reports (test set)) and one offline LLM approach (LLM-model, LLaMa-2, Meta-AI) for PHI-anonymization. The three models were tested on 100 randomly selected chest CT reports. Two investigators assessed the anonymization of occurring PHI entities and whether clinical information was removed. Subsequently, precision, recall, and F1 scores were calculated. RESULTS: NLPac and NLPsp successfully removed all instances of dates (n = 333), medical record numbers (MRN) (n = 6), and accession numbers (ACC) (n = 92). The LLM model removed all MRNs, 96% of ACCs, and 32% of dates. NLPac was most consistent with a perfect F1-score of 1.00, followed by NLPsp with lower precision (0.86) and F1-score (0.92) for dates. The LLM model had perfect precision for MRNs, ACCs, and dates but the lowest recall for ACC (0.96) and dates (0.52), corresponding F1 scores of 0.98 and 0.68, respectively. Names were removed completely or majorly (i.e., one first or family name non-anonymized) in 100% (NLPac), 72% (NLPsp), and 90% (LLM-model). Importantly, NLPac and NLPsp did not remove medical information, while the LLM model did in 10% (n = 10). CONCLUSION: Pre-trained NLP models can effectively anonymize free-text radiology reports, while anonymization with the LLM model is more prone to deleting medical information. KEY POINTS: Question This study compares NLP and locally hosted LLM techniques to ensure PHI anonymization without losing clinical information. Findings Pre-trained NLP models effectively anonymized radiology reports without removing clinical data, while a locally hosted LLM was less reliable, risking the loss of important information. Clinical relevance Fast, reliable, automated anonymization of PHI from radiology reports enables HIPAA-compliant secondary use, facilitating advanced applications like LLM-driven radiology analysis while ensuring ethical handling of sensitive patient data.
RESUMEN
BACKGROUND: Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set's statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice. OBJECTIVE: The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study. METHODS: The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case-specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results. RESULTS: Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy. CONCLUSIONS: Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case-specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data. TRIAL REGISTRATION: German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1093/ndt/gfr456.
Asunto(s)
Anonimización de la Información , Humanos , Insuficiencia Renal Crónica/terapia , Difusión de la Información/métodos , Algoritmos , Alemania , Confidencialidad , PrivacidadRESUMEN
INTRODUCTION: Recent technological advances have increased the risk that de-identified brain images could be re-identified from face imagery. The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a leading source of publicly available de-identified brain imaging, who quickly acted to protect participants' privacy. METHODS: An independent expert committee evaluated 11 face-deidentification ("de-facing") methods and selected four for formal testing. RESULTS: Effects of de-facing on brain measurements were comparable across methods and sufficiently small to recommend de-facing in ADNI. The committee ultimately recommended mri_reface for advantages in reliability, and for some practical considerations. ADNI leadership approved the committee's recommendation, beginning in ADNI4. DISCUSSION: ADNI4 de-faces all applicable brain images before subsequent pre-processing, analyses, and public release. Trained analysts inspect de-faced images to confirm complete face removal and complete non-modification of brain. This paper details the history of the algorithm selection process and extensive validation, then describes the production workflows for de-facing in ADNI. HIGHLIGHTS: ADNI is implementing "de-facing" of MRI and PET beginning in ADNI4. "De-facing" alters face imagery in brain images to help protect privacy. Four algorithms were extensively compared for ADNI and mri_reface was chosen. Validation confirms mri_reface is robust and effective for ADNI sequences. Validation confirms mri_reface negligibly affects ADNI brain measurements.
RESUMEN
Broad access to health data offers great potential for science and research. However, health data often contains sensitive information that must be protected in a special way. In this context, the article deals with the re-identification potential of health data. After defining the relevant terms, we discuss factors that influence the re-identification potential. We summarize international privacy standards for health data and highlight the importance of background knowledge. Given that the reidentification potential is often underestimated in practice, we present strategies for mitigation based on the Five Safes concept. We also discuss classical data protection strategies as well as methods for generating synthetic health data. The article concludes with a brief discussion and outlook on the planned Health Data Lab at the Federal Institute for Drugs and Medical Devices.
Asunto(s)
Seguridad Computacional , Privacidad , Alemania , ConfidencialidadRESUMEN
It is now widely known that research brain MRI, CT, and PET images may potentially be re-identified using face recognition, and this potential can be reduced by applying face-deidentification ("de-facing") software. However, for research MRI sequences beyond T1-weighted (T1-w) and T2-FLAIR structural images, the potential for re-identification and quantitative effects of de-facing are both unknown, and the effects of de-facing T2-FLAIR are also unknown. In this work we examine these questions (where applicable) for T1-w, T2-w, T2*-w, T2-FLAIR, diffusion MRI (dMRI), functional MRI (fMRI), and arterial spin labelling (ASL) sequences. Among current-generation, vendor-product research-grade sequences, we found that 3D T1-w, T2-w, and T2-FLAIR were highly re-identifiable (96-98%). 2D T2-FLAIR and 3D multi-echo GRE (ME-GRE) were also moderately re-identifiable (44-45%), and our derived T2* from ME-GRE (comparable to a typical 2D T2*) matched at only 10%. Finally, diffusion, functional and ASL images were each minimally re-identifiable (0-8%). Applying de-facing with mri_reface version 0.3 reduced successful re-identification to ≤8%, while differential effects on popular quantitative pipelines for cortical volumes and thickness, white matter hyperintensities (WMH), and quantitative susceptibility mapping (QSM) measurements were all either comparable with or smaller than scan-rescan estimates. Consequently, high-quality de-facing software can greatly reduce the risk of re-identification for identifiable MRI sequences with only negligible effects on automated intracranial measurements. The current-generation echo-planar and spiral sequences (dMRI, fMRI, and ASL) each had minimal match rates, suggesting that they have a low risk of re-identification and can be shared without de-facing, but this conclusion should be re-evaluated if they are acquired without fat suppression, with a full-face scan coverage, or if newer developments reduce the current levels of artifacts and distortion around the face.
Asunto(s)
Imagen de Difusión por Resonancia Magnética , Imagen por Resonancia Magnética , Humanos , Imagen por Resonancia Magnética/métodos , Imagen de Difusión por Resonancia Magnética/métodos , Neuroimagen , Artefactos , Marcadores de SpinRESUMEN
BACKGROUND: Subject-level real-world data (RWD) collected during daily healthcare practices are increasingly used in medical research to assess questions that cannot be addressed in the context of a randomized controlled trial (RCT). A novel application of RWD arises from the need to create external control arms (ECAs) for single-arm RCTs. In the analysis of ECAs against RCT data, there is an evident need to manage and analyze RCT data and RWD in the same technical environment. In the Nordic countries, legal requirements may require that the original subject-level data be anonymized, i.e., modified so that the risk to identify any individual is minimal. The aim of this study was to conduct initial exploration on how well pseudonymized and anonymized RWD perform in the creation of an ECA for an RCT. METHODS: This was a hybrid observational cohort study using clinical data from the control arm of the completed randomized phase II clinical trial (PACIFIC-AF) and RWD cohort from Finnish healthcare data sources. The initial pseudonymized RWD were anonymized within the (k, ε)-anonymity framework (a model for protecting individuals against identification). Propensity score matching and weighting methods were applied to the anonymized and pseudonymized RWD, to balance potential confounders against the RCT data. Descriptive statistics for the potential confounders and overall survival analyses were conducted prior to and after matching and weighting, using both the pseudonymized and anonymized RWD sets. RESULTS: Anonymization affected the baseline characteristics of potential confounders only marginally. The greatest difference was in the prevalence of chronic obstructive pulmonary disease (4.6% vs. 5.4% in the pseudonymized compared to the anonymized data, respectively). Moreover, the overall survival changed in anonymization by only 8% (95% CI 4-22%). Both the pseudonymized and anonymized RWD were able to produce matched ECAs for the RCT data. Anonymization after matching impacted overall survival analysis by 22% (95% CI -21-87%). CONCLUSIONS: Anonymization may be a viable technique for cases where flexible data transfer and sharing are required. As anonymization necessarily affects some aspects of the original data, further research and careful consideration of anonymization strategies are needed.
Asunto(s)
Investigación Biomédica , Anonimización de la Información , Humanos , Investigación Biomédica/métodos , Ensayos Clínicos Controlados Aleatorios como Asunto , Ensayos Clínicos Fase II como AsuntoRESUMEN
BACKGROUND: YouTube has become a popular source of health care information, reaching an estimated 81% of adults in 2021; approximately 35% of adults in the United States have used the internet to self-diagnose a condition. Public health researchers are therefore incorporating YouTube data into their research, but guidelines for best practices around research ethics using social media data, such as YouTube, are unclear. OBJECTIVE: This study aims to describe approaches to research ethics for public health research implemented using YouTube data. METHODS: We implemented a systematic review of articles found in PubMed, SocINDEX, Web of Science, and PsycINFO following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. To be eligible to be included, studies needed to be published in peer-reviewed journals in English between January 1, 2006, and October 31, 2019, and include analyses on publicly available YouTube data on health or public health topics; studies using primary data collection, such as using YouTube for study recruitment, interventions, or dissemination evaluations, were not included. We extracted data on the presence of user identifying information, institutional review board (IRB) review, and informed consent processes, as well as research topic and methodology. RESULTS: This review includes 119 articles from 88 journals. The most common health and public health topics studied were in the categories of chronic diseases (44/119, 37%), mental health and substance use (26/119, 21.8%), and infectious diseases (20/119, 16.8%). The majority (82/119, 68.9%) of articles made no mention of ethical considerations or stated that the study did not meet the definition of human participant research (16/119, 13.4%). Of those that sought IRB review (15/119, 12.6%), 12 out of 15 (80%) were determined to not meet the definition of human participant research and were therefore exempt from IRB review, and 3 out of 15 (20%) received IRB approval. None of the 3 IRB-approved studies contained identifying information; one was explicitly told not to include identifying information by their ethics committee. Only 1 study sought informed consent from YouTube users. Of 119 articles, 33 (27.7%) contained identifying information about content creators or video commenters, one of which attempted to anonymize direct quotes by not including user information. CONCLUSIONS: Given the variation in practice, concrete guidelines on research ethics for social media research are needed, especially around anonymizing and seeking consent when using identifying information. TRIAL REGISTRATION: PROSPERO CRD42020148170; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=148170.
Asunto(s)
Ética en Investigación , Medios de Comunicación Sociales , Adulto , Humanos , Recolección de Datos , Comités de Ética en Investigación , Consentimiento InformadoRESUMEN
BACKGROUND: Electronic health records (EHRs) in unstructured formats are valuable sources of information for research in both the clinical and biomedical domains. However, before such records can be used for research purposes, sensitive health information (SHI) must be removed in several cases to protect patient privacy. Rule-based and machine learning-based methods have been shown to be effective in deidentification. However, very few studies investigated the combination of transformer-based language models and rules. OBJECTIVE: The objective of this study is to develop a hybrid deidentification pipeline for Australian EHR text notes using rules and transformers. The study also aims to investigate the impact of pretrained word embedding and transformer-based language models. METHODS: In this study, we present a hybrid deidentification pipeline called OpenDeID, which is developed using an Australian multicenter EHR-based corpus called OpenDeID Corpus. The OpenDeID corpus consists of 2100 pathology reports with 38,414 SHI entities from 1833 patients. The OpenDeID pipeline incorporates a hybrid approach of associative rules, supervised deep learning, and pretrained language models. RESULTS: The OpenDeID achieved a best F1-score of 0.9659 by fine-tuning the Discharge Summary BioBERT model and incorporating various preprocessing and postprocessing rules. The OpenDeID pipeline has been deployed at a large tertiary teaching hospital and has processed over 8000 unstructured EHR text notes in real time. CONCLUSIONS: The OpenDeID pipeline is a hybrid deidentification pipeline to deidentify SHI entities in unstructured EHR text notes. The pipeline has been evaluated on a large multicenter corpus. External validation will be undertaken as part of our future work to evaluate the effectiveness of the OpenDeID pipeline.
Asunto(s)
Anonimización de la Información , Registros Electrónicos de Salud , Humanos , Australia , Algoritmos , Hospitales de EnseñanzaRESUMEN
The Internet of vehicles (IoVs) is an innovative paradigm which ensures a safe journey by communicating with other vehicles. It involves a basic safety message (BSM) that contains sensitive information in a plain text that can be subverted by an adversary. To reduce such attacks, a pool of pseudonyms is allotted which are changed regularly in different zones or contexts. In base schemes, the BSM is sent to neighbors just by considering their speed. However, this parameter is not enough because network topology is very dynamic and vehicles can change their route at any time. This problem increases pseudonym consumption which ultimately increases communication overhead, increases traceability and has high BSM loss. This paper presents an efficient pseudonym consumption protocol (EPCP) which considers the vehicles in the same direction, and similar estimated location. The BSM is shared only to these relevant vehicles. The performance of the purposed scheme in contrast to base schemes is validated via extensive simulations. The results prove that the proposed EPCP technique outperformed compared to its counterparts in terms of pseudonym consumption, BSM loss rate and achieved traceability.
Asunto(s)
Anónimos y Seudónimos , Seguridad Computacional , Internet , ComunicaciónRESUMEN
Data anonymization is a technique that safeguards individuals' privacy by modifying attribute values in published data. However, increased modifications enhance privacy but diminish the utility of published data, necessitating a balance between privacy and utility levels. K-Anonymity is a crucial anonymization technique that generates k-anonymous clusters, where the probability of disclosing a record is 1/k. However, k-anonymity fails to protect against attribute disclosure when the diversity of sensitive values within the anonymous cluster is insufficient. Several techniques have been proposed to address this issue, among which t-closeness is considered one of the most robust privacy techniques. In this paper, we propose a novel approach employing a greedy and information-theoretic clustering-based algorithm to achieve strict privacy protection. The proposed anonymization algorithm commences by clustering the data based on both the similarity of quasi-identifier values and the diversity of sensitive attribute values. In the subsequent adjustment phase, the algorithm splits and merges the clusters to ensure that they each possess at least k members and adhere to the t-closeness requirements. Finally, the algorithm replaces the quasi-identifier values of the records in each cluster with the values of the cluster center to attain k-anonymity and t-closeness. Experimental results on three microdata sets from Facebook, Twitter, and Google+ demonstrate the proposed algorithm's ability to preserve the utility of released data by minimizing the modifications of attribute values while satisfying the k-anonymity and t-closeness constraints.
RESUMEN
Healthcare data held by state-run organisations is a valuable intangible asset for society. Its use should be a priority for its administrators and the state. A completely paternalistic approach by administrators and the state is undesirable, however much it aims to protect the privacy rights of persons registered in databases. In line with European policies and the global trend, these measures should not outweigh the social benefit that arises from the analysis of these data if the technical possibilities exist to sufficiently protect the privacy rights of individuals. Czech society is having an intense discussion on the topic, but according to the authors, it is insufficiently based on facts and lacks clearly articulated opinions of the expert public. The aim of this article is to fill these gaps. Data anonymization techniques provide a solution to protect individuals' privacy rights while preserving the scientific value of the data. The risk of identifying individuals in anonymised data sets is scalable and can be minimised depending on the type and content of the data and its use by the specific applicant. Finding the optimal form and scope of deidentified data requires competence and knowledge on the part of both the applicant and the administrator. It is in the interest of the applicant, the administrator, as well as the protected persons in the databases that both parties show willingness and have the ability and expertise to communicate during the application and its processing.
Asunto(s)
Confidencialidad , Anonimización de la Información , Humanos , PrivacidadRESUMEN
It is well known that de-identified research brain images from MRI and CT can potentially be re-identified using face recognition; however, this has not been examined for PET images. We generated face reconstruction images of 182 volunteers using amyloid, tau, and FDG PET scans, and we measured how accurately commercial face recognition software (Microsoft Azure's Face API) automatically matched them with the individual participants' face photographs. We then compared this accuracy with the same experiments using participants' CT and MRI. Face reconstructions from PET images from PET/CT scanners were correctly matched at rates of 42% (FDG), 35% (tau), and 32% (amyloid), while CT were matched at 78% and MRI at 97-98%. We propose that these recognition rates are high enough that research studies should consider using face de-identification ("de-facing") software on PET images, in addition to CT and structural MRI, before data sharing. We also updated our mri_reface de-identification software with extended functionality to replace face imagery in PET and CT images. Rates of face recognition on de-faced images were reduced to 0-4% for PET, 5% for CT, and 8% for MRI. We measured the effects of de-facing on regional amyloid PET measurements from two different measurement pipelines (PETSurfer/FreeSurfer 6.0, and one in-house method based on SPM12 and ANTs), and these effects were small: ICC values between de-faced and original images were > 0.98, biases were <2%, and median relative errors were < 2%. Effects on global amyloid PET SUVR measurements were even smaller: ICC values were 1.00, biases were <0.5%, and median relative errors were also <0.5%.
Asunto(s)
Reconocimiento Facial , Tomografía Computarizada por Tomografía de Emisión de Positrones , Amiloide , Encéfalo/diagnóstico por imagen , Fluorodesoxiglucosa F18 , Humanos , Imagen por Resonancia Magnética/métodos , Tomografía de Emisión de Positrones/métodosRESUMEN
With the advancing of location-detection technologies and the increasing popularity of mobile phones and other location-aware devices, trajectory data is continuously growing. While large-scale trajectories provide opportunities for various applications, the locations in trajectories pose a threat to individual privacy. Recently, there has been an interesting debate on the reidentifiability of individuals in the Science magazine. The main finding of Sánchez et al. is exactly opposite to that of De Montjoye et al., which raises the first question: "what is the true situation of the privacy preservation for trajectories in terms of reidentification?" Furthermore, it is known that anonymization typically causes a decline of data utility, and anonymization mechanisms need to consider the trade-off between privacy and utility. This raises the second question: "what is the true situation of the utility of anonymized trajectories?" To answer these two questions, we conduct a systematic experimental study, using three real-life trajectory datasets, five existing anonymization mechanisms (i.e., identifier anonymization, grid-based anonymization, dummy trajectories, k-anonymity and ε-differential privacy), and two practical applications (i.e., travel time estimation and window range queries). Our findings reveal the true situation of the privacy preservation for trajectories in terms of reidentification and the true situation of the utility of anonymized trajectories, and essentially close the debate between De Montjoye et al. and Sánchez et al. To the best of our knowledge, this study is among the first systematic evaluation and analysis of anonymized trajectories on the individual privacy in terms of unicity and on the utility in terms of practical applications. Supplementary Information: The online version contains supplementary material available at 10.1007/s11390-022-2409-x.
RESUMEN
Recent advances in automated face recognition algorithms have increased the risk that de-identified research MRI scans may be re-identifiable by matching them to identified photographs using face recognition. A variety of software exist to de-face (remove faces from) MRI, but their ability to prevent face recognition has never been measured and their image modifications can alter automated brain measurements. In this study, we compared three popular de-facing techniques and introduce our mri_reface technique designed to minimize effects on brain measurements by replacing the face with a population average, rather than removing it. For each technique, we measured 1) how well it prevented automated face recognition (i.e. effects on exceptionally-motivated individuals) and 2) how it altered brain measurements from SPM12, FreeSurfer, and FSL (i.e. effects on the average user of de-identified data). Before de-facing, 97% of scans from a sample of 157 volunteers were correctly matched to photographs using automated face recognition. After de-facing with popular software, 28-38% of scans still retained enough data for successful automated face matching. Our proposed mri_reface had similar performance with the best existing method (fsl_deface) at preventing face recognition (28-30%) and it had the smallest effects on brain measurements in more pipelines than any other, but these differences were modest.
Asunto(s)
Reconocimiento Facial Automatizado/métodos , Investigación Biomédica/métodos , Encéfalo/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética/métodos , Neuroimagen/métodos , Adulto , Anciano , Anciano de 80 o más Años , Algoritmos , Reconocimiento Facial Automatizado/tendencias , Encéfalo/fisiología , Femenino , Humanos , Procesamiento de Imagen Asistido por Computador/tendencias , Imagen por Resonancia Magnética/tendencias , Masculino , Persona de Mediana Edad , Neuroimagen/tendencias , Programas Informáticos/tendenciasRESUMEN
Surface rendering of MRI brain scans may lead to identification of the participant through facial characteristics. In this study, we evaluate three methods that overwrite voxels containing privacy-sensitive information: Face Masking, FreeSurfer defacing, and FSL defacing. We included structural T1-weighted MRI scans of children, young adults and older adults. For the young adults, test-retest data were included with a 1-week interval. The effects of the de-identification methods were quantified using different statistics to capture random variation and systematic noise in measures obtained through the FreeSurfer processing pipeline. Face Masking and FSL defacing impacted brain voxels in some scans especially in younger participants. FreeSurfer defacing left brain tissue intact in all cases. FSL defacing and FreeSurfer defacing preserved identifiable characteristics around the eyes or mouth in some scans. For all de-identification methods regional brain measures of subcortical volume, cortical volume, cortical surface area, and cortical thickness were on average highly replicable when derived from original versus de-identified scans with average regional correlations >.90 for children, young adults, and older adults. Small systematic biases were found that incidentally resulted in significantly different brain measures after de-identification, depending on the studied subsample, de-identification method, and brain metric. In young adults, test-retest intraclass correlation coefficients (ICCs) were comparable for original scans and de-identified scans with average regional ICCs >.90 for (sub)cortical volume and cortical surface area and ICCs >.80 for cortical thickness. We conclude that apparent visual differences between de-identification methods minimally impact reliability of brain measures, although small systematic biases can occur.
Asunto(s)
Encéfalo/diagnóstico por imagen , Anonimización de la Información , Procesamiento de Imagen Asistido por Computador , Imagen por Resonancia Magnética , Neuroimagen , Adulto , Factores de Edad , Anciano , Anciano de 80 o más Años , Corteza Cerebral , Niño , Femenino , Humanos , Masculino , Persona de Mediana Edad , Adulto JovenRESUMEN
PURPOSE: Publicly available data provision is an essential part of open science. However, open data can conflict with data privacy and data protection regulations. Head scans are particularly vulnerable because the subject's face can be reconstructed from the acquired images. Although defacing can impede subject identification in reconstructed images, this approach is not applicable to k-space raw data. To address this challenge and allow defacing of raw data for publication, we present chemical shift-based prospective k-space anonymization (CHARISMA). METHODS: In spin-warp imaging, fat shift occurs along the frequency-encoding direction. By placing an oil-filled mask onto the subject's face, the shifted fat signal can overlap with the face to deface k-space during the acquisition. The CHARISMA approach was tested for gradient-echo sequences in a single subject wearing the oil-filled mask at 7 T. Different fat shifts were compared by varying the readout bandwidth. Furthermore, intensity-based segmentation was used to test whether the images could be unmasked retrospectively. RESULTS: To impede subject identification after retrospective unmasking, the signal of face and shifted oil should overlap. In this single-subject study, a shift of 3.3 mm to 4.9 mm resulted in the most efficient masking. Independent of CHARISMA, long TEs induce signal decay and dephasing, which impeded unmasking. CONCLUSION: To our best knowledge, CHARISMA is the first prospective k-space defacing approach. With proper fat-shift direction and amplitude, this easy-to-build, low-cost solution impaired subject identification in gradient-echo data considerably. Further sequences will be tested with CHARISMA in the future.
Asunto(s)
Imagen por Resonancia Magnética , Estudios Prospectivos , Estudios RetrospectivosRESUMEN
Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data utility. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely upon bespoke data sharing contracts. The lengthy process and administration induced by these contracts increases the inefficiency of data sharing and may disincentivize important clinical treatment and medical research. This paper provides a synthesis between 2 novel advanced privacy-enhancing technologies-homomorphic encryption and secure multiparty computation (defined together as multiparty homomorphic encryption). These privacy-enhancing technologies provide a mathematical guarantee of privacy, with multiparty homomorphic encryption providing a performance advantage over separately using homomorphic encryption or secure multiparty computation. We argue multiparty homomorphic encryption fulfills legal requirements for medical data sharing under the European Union's General Data Protection Regulation which has set a global benchmark for data protection. Specifically, the data processed and shared using multiparty homomorphic encryption can be considered anonymized data. We explain how multiparty homomorphic encryption can reduce the reliance upon customized contractual measures between institutions. The proposed approach can accelerate the pace of medical research while offering additional incentives for health care and research institutes to employ common data interoperability standards.
Asunto(s)
Seguridad Computacional/ética , Difusión de la Información/ética , Privacidad/legislación & jurisprudencia , Tecnología/métodos , HumanosRESUMEN
Personal medical information is an essential resource for research; however, there are laws that regulate its use, and it typically has to be pseudonymized or anonymized. When data are anonymized, the quantity and quality of extractable information decrease significantly. From the perspective of a clinical researcher, a method of achieving pseudonymized data without degrading data quality while also preventing data loss is proposed herein. As the level of pseudonymization varies according to the research purpose, the pseudonymization method applied should be carefully chosen. Therefore, the active participation of clinicians is crucial to transform the data according to the research purpose. This can contribute to data security by simply transforming the data through secondary data processing. Case studies demonstrated that, compared with the initial baseline data, there was a clinically significant difference in the number of datapoints added with the participation of a clinician (from 267,979 to 280,127 points, P < 0.001). Thus, depending on the degree of clinician participation, data anonymization may not affect data quality and quantity, and proper data quality management along with data security are emphasized. Although the pseudonymization level and clinical use of data have a trade-off relationship, it is possible to create pseudonymized data while maintaining the data quality required for a given research purpose. Therefore, rather than relying solely on security guidelines, the active participation of clinicians is important.
Asunto(s)
Exactitud de los Datos , Anonimización de la Información , Investigación Biomédica , Enfermedades Cardiovasculares/patología , Anonimización de la Información/legislación & jurisprudencia , HumanosRESUMEN
Location privacy is a critical problem in the vehicular communication networks. Vehicles broadcast their road status information to other entities in the network through beacon messages to inform other entities in the network. The beacon message content consists of the vehicle ID, speed, direction, position, and other information. An adversary could use vehicle identity and positioning information to determine vehicle driver behavior and identity at different visited location spots. A pseudonym can be used instead of the vehicle ID to help in the vehicle location privacy. These pseudonyms should be changed in appropriate way to produce uncertainty for any adversary attempting to identify a vehicle at different locations. In the existing research literature, pseudonyms are changed during silent mode between neighbors. However, the use of a short silent period and the visibility of pseudonyms of direct neighbors provides a mechanism for an adversary to determine the identity of a target vehicle at specific locations. Moreover, privacy is provided to the driver, only within the RSU range; outside it, there is no privacy protection. In this research, we address the problem of location privacy in a highway scenario, where vehicles are traveling at high speeds with diverse traffic density. We propose a Dynamic Grouping and Virtual Pseudonym-Changing (DGVP) scheme for vehicle location privacy. Dynamic groups are formed based on similar status vehicles and cooperatively change pseudonyms. In the case of low traffic density, we use a virtual pseudonym update process. We formally present the model and specify the scheme through High-Level Petri Nets (HLPN). The simulation results indicate that the proposed method improves the anonymity set size and entropy, provides lower traceability, reduces impact on vehicular network applications, and has lower computation cost compared to existing research work.