Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 174
Filter
Add more filters

Publication year range
1.
Annu Rev Genomics Hum Genet ; 24: 347-368, 2023 08 25.
Article in English | MEDLINE | ID: mdl-37253596

ABSTRACT

Continued advances in precision medicine rely on the widespread sharing of data that relate human genetic variation to disease. However, data sharing is severely limited by legal, regulatory, and ethical restrictions that safeguard patient privacy. Federated analysis addresses this problem by transferring the code to the data-providing the technical and legal capability to analyze the data within their secure home environment rather than transferring the data to another institution for analysis. This allows researchers to gain new insights from data that cannot be moved, while respecting patient privacy and the data stewards' legal obligations. Because federated analysis is a technical solution to the legal challenges inherent in data sharing, the technology and policy implications must be evaluated together. Here, we summarize the technical approaches to federated analysis and provide a legal analysis of their policy implications.


Subject(s)
Fenbendazole , Privacy , Humans , Health Facilities , Information Dissemination , Policy
2.
Am J Epidemiol ; 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38973755

ABSTRACT

Epidemiologic studies frequently use risk ratios to quantify associations between exposures and binary outcomes. When the data are physically stored at multiple data partners, it can be challenging to perform individual-level analysis if data cannot be pooled centrally due to privacy constraints. Existing methods either require multiple file transfers between each data partner and an analysis center (e.g., distributed regression) or only provide approximate estimation of the risk ratio (e.g., meta-analysis). Here we develop a practical method that requires a single transfer of eight summary-level quantities from each data partner. Our approach leverages an existing risk-set method and software originally developed for Cox regression. Sharing only summary-level information, the proposed method provides risk ratio estimates and confidence intervals identical to those that would be provided - if individual-level data were pooled - by the modified Poisson regression. We justify the method theoretically, confirm its performance using simulated data, and implement it in a distributed analysis of COVID-19 data from the U.S. Food and Drug Administration's Sentinel System.

3.
BMC Public Health ; 24(1): 2523, 2024 Sep 17.
Article in English | MEDLINE | ID: mdl-39289666

ABSTRACT

BACKGROUND: Survey studies in medical and health sciences predominantly apply a conventional direct questioning (DQ) format to gather private and highly personal information. If the topic under investigation is sensitive or even stigmatizing, such as COVID-19-related health behaviors and adherence to non-pharmaceutical interventions in general, DQ surveys can lead to nonresponse and untruthful answers due to the influence of social desirability bias (SDB). These effects seriously threaten the validity of the results obtained, potentially leading to distorted prevalence estimates for behaviors for which the prevalence in the population is unknown. While this issue cannot be completely avoided, indirect questioning techniques (IQTs) offer a means to mitigate the harmful influence of SDB by guaranteeing the confidentiality of individual responses. The present study aims at assessing the validity of a recently proposed IQT, the Cheating Detection Triangular Model (CDTRM), in estimating the prevalence of COVID-19-related health behaviors while accounting for cheaters who disregard the instructions. METHODS: In an online survey of 1,714 participants in Taiwan, we obtained CDTRM prevalence estimates via an Expectation-Maximization algorithm for three COVID-19-related health behaviors with different levels of sensitivity. The CDTRM estimates were compared to DQ estimates and to available official statistics provided by the Taiwan Centers for Disease Control. Additionally, the CDTRM allowed us to estimate the share of cheaters who disregarded the instructions and adjust the prevalence estimates for the COVID-19-related health behaviors accordingly. RESULTS: For a behavior with low sensitivity, CDTRM and DQ estimates were expectedly comparable and in line with official statistics. However, for behaviors with medium and high sensitivity, CDTRM estimates were higher and thus presumably more valid than DQ estimates. Analogously, the estimated cheating rate increased with higher sensitivity of the behavior under study. CONCLUSIONS: Our findings strongly support the assumption that the CDTRM successfully controlled for the validity-threatening influence of SDB in a survey on three COVID-19-related health behaviors. Consequently, the CDTRM appears to be a promising technique to increase estimation validity compared to conventional DQ for health-related behaviors, and sensitive attributes in general, for which a strong influence of SDB is to be expected.


Subject(s)
COVID-19 , Health Behavior , Humans , COVID-19/epidemiology , Male , Female , Adult , Prevalence , Middle Aged , Taiwan/epidemiology , Deception , Young Adult , Surveys and Questionnaires , Adolescent , Models, Statistical , Aged
4.
J Med Internet Res ; 26: e55676, 2024 May 28.
Article in English | MEDLINE | ID: mdl-38805692

ABSTRACT

BACKGROUND: Clinical natural language processing (NLP) researchers need access to directly comparable evaluation results for applications such as text deidentification across a range of corpus types and the means to easily test new systems or corpora within the same framework. Current systems, reported metrics, and the personally identifiable information (PII) categories evaluated are not easily comparable. OBJECTIVE: This study presents an open-source and extensible end-to-end framework for comparing clinical NLP system performance across corpora even when the annotation categories do not align. METHODS: As a use case for this framework, we use 6 off-the-shelf text deidentification systems (ie, CliniDeID, deid from PhysioNet, MITRE Identity Scrubber Toolkit [MIST], NeuroNER, National Library of Medicine [NLM] Scrubber, and Philter) across 3 standard clinical text corpora for the task (2 of which are publicly available) and 1 private corpus (all in English), with annotation categories that are not directly analogous. The framework is built on shell scripts that can be extended to include new systems, corpora, and performance metrics. We present this open tool, multiple means for aligning PII categories during evaluation, and our initial timing and performance metric findings. Code for running this framework with all settings needed to run all pairs are available via Codeberg and GitHub. RESULTS: From this case study, we found large differences in processing speed between systems. The fastest system (ie, MIST) processed an average of 24.57 (SD 26.23) notes per second, while the slowest (ie, CliniDeID) processed an average of 1.00 notes per second. No system uniformly outperformed the others at identifying PII across corpora and categories. Instead, a rich tapestry of performance trade-offs emerged for PII categories. CliniDeID and Philter prioritize recall over precision (with an average recall 6.9 and 11.2 points higher, respectively, for partially matching spans of text matching any PII category), while the other 4 systems consistently have higher precision (with MIST's precision scoring 20.2 points higher, NLM Scrubber scoring 4.4 points higher, NeuroNER scoring 7.2 points higher, and deid scoring 17.1 points higher). The macroaverage recall across corpora for identifying names, one of the more sensitive PII categories, included deid (48.8%) and MIST (66.9%) at the low end and NeuroNER (84.1%), NLM Scrubber (88.1%), and CliniDeID (95.9%) at the high end. A variety of metrics across categories and corpora are reported with a wider variety (eg, F2-score) available via the tool. CONCLUSIONS: NLP systems in general and deidentification systems and corpora in our use case tend to be evaluated in stand-alone research articles that only include a limited set of comparators. We hold that a single evaluation pipeline across multiple systems and corpora allows for more nuanced comparisons. Our open pipeline should reduce barriers to evaluation and system advancement.


Subject(s)
Natural Language Processing
5.
J Med Internet Res ; 26: e46455, 2024 Aug 20.
Article in English | MEDLINE | ID: mdl-39163593

ABSTRACT

BACKGROUND: Pregnancy and gestation information is routinely recorded in electronic medical record (EMR) systems across China in various data sets. The combination of data on the number of pregnancies and gestations can imply occurrences of abortions and other pregnancy-related issues, which is important for clinical decision-making and personal privacy protection. However, the distribution of this information inside EMR is variable due to inconsistent IT structures across different EMR systems. A large-scale quantitative evaluation of the potential exposure of this sensitive information has not been previously performed, ensuring the protection of personal information is a priority, as emphasized in Chinese laws and regulations. OBJECTIVE: This study aims to perform the first nationwide quantitative analysis of the identification sites and exposure frequency of sensitive pregnancy and gestation information. The goal is to propose strategies for effective information extraction and privacy protection related to women's health. METHODS: This study was conducted in a national health care data network. Rule-based protocols for extracting pregnancy and gestation information were developed by a committee of experts. A total of 6 different sub-data sets of EMRs were used as schemas for data analysis and strategy proposal. The identification sites and frequencies of identification in different sub-data sets were calculated. Manual quality inspections of the extraction process were performed by 2 independent groups of reviewers on 1000 randomly selected records. Based on these statistics, strategies for effective information extraction and privacy protection were proposed. RESULTS: The data network covered hospitalized patients from 19 hospitals in 10 provinces of China, encompassing 15,245,055 patients over an 11-year period (January 1, 2010-December 12, 2020). Among women aged 14-50 years, 70% were randomly selected from each hospital, resulting in a total of 1,110,053 patients. Of these, 688,268 female patients with sensitive reproductive information were identified. The frequencies of identification were variable, with the marriage history in admission medical records being the most frequent at 63.24%. Notably, more than 50% of female patients were identified with pregnancy and gestation history in nursing records, which is not generally considered a sub-data set rich in reproductive information. During the manual curation and review process, 1000 cases were randomly selected, and the precision and recall rates of the information extraction method both exceeded 99.5%. The privacy-protection strategies were designed with clear technical directions. CONCLUSIONS: Significant amounts of critical information related to women's health are recorded in Chinese routine EMR systems and are distributed in various parts of the records with different frequencies. This requires a comprehensive protocol for extracting and protecting the information, which has been demonstrated to be technically feasible. Implementing a data-based strategy will enhance the protection of women's privacy and improve the accessibility of health care services.


Subject(s)
Confidentiality , Electronic Health Records , Humans , Pregnancy , Female , China , Retrospective Studies , Adult
6.
Sensors (Basel) ; 24(16)2024 Aug 06.
Article in English | MEDLINE | ID: mdl-39204802

ABSTRACT

The Windows registry contains a plethora of information in a hierarchical database. It includes system-wide settings, user preferences, installed programs, and recently accessed files and maintains timestamps that can be used to construct a detailed timeline of user activities. However, these data are unencrypted and thus vulnerable to exploitation by malicious actors who gain access to this repository. To address this security and privacy concern, we propose a novel approach that efficiently encrypts and decrypts sensitive registry data in real time. Our developed proof-of-concept program intercepts interactions between the registry's application programming interfaces (APIs) and other Windows applications using an advanced hooking technique. This enables the proposed system to be transparent to users without requiring any changes to the operating system or installed software. Our approach also implements the data protection API (DPAPI) developed by Microsoft to securely manage each user's encryption key. Ultimately, our research provides an enhanced security and privacy framework for the Windows registry, effectively fortifying the registry against security and privacy threats while maintaining its accessibility to legitimate users and applications.

7.
Entropy (Basel) ; 26(3)2024 Mar 13.
Article in English | MEDLINE | ID: mdl-38539765

ABSTRACT

The drawbacks of a one-dimensional chaotic map are its straightforward structure, abrupt intervals, and ease of signal prediction. Richer performance and a more complicated structure are required for multidimensional chaotic mapping. To address the shortcomings of current chaotic systems, an n-dimensional cosine-transform-based chaotic system (nD-CTBCS) with a chaotic coupling model is suggested in this study. To create chaotic maps of any desired dimension, nD-CTBCS can take advantage of already-existing 1D chaotic maps as seed chaotic maps. Three two-dimensional chaotic maps are provided as examples to illustrate the impact. The findings of the evaluation and experiments demonstrate that the newly created chaotic maps function better, have broader chaotic intervals, and display hyperchaotic behavior. To further demonstrate the practicability of nD-CTBCS, a reversible data hiding scheme is proposed for the secure communication of medical images. The experimental results show that the proposed method has higher security than the existing methods.

8.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32591779

ABSTRACT

Genome-wide association studies (GWAS) have been widely used for identifying potential risk variants in various diseases. A statistically meaningful GWAS typically requires a large sample size to detect disease-associated single nucleotide polymorphisms (SNPs). However, a single institution usually only possesses a limited number of samples. Therefore, cross-institutional partnerships are required to increase sample size and statistical power. However, cross-institutional partnerships offer significant challenges, a major one being data privacy. For example, the privacy awareness of people, the impact of data privacy leakages and the privacy-related risks are becoming increasingly important, while there is no de-identification standard available to safeguard genomic data sharing. In this paper, we introduce a novel privacy-preserving federated GWAS framework (iPRIVATES). Equipped with privacy-preserving federated analysis, iPRIVATES enables multiple institutions to jointly perform GWAS analysis without leaking patient-level genotyping data. Only aggregated local statistics are exchanged within the study network. In addition, we evaluate the performance of iPRIVATES through both simulated data and a real-world application for identifying potential risk variants in ankylosing spondylitis (AS). The experimental results showed that the strongest signal of AS-associated SNPs reside mostly around the human leukocyte antigen (HLA) regions. The proposed iPRIVATES framework achieved equivalent results as traditional centralized implementation, demonstrating its great potential in driving collaborative genomic research for different diseases while preserving data privacy.


Subject(s)
Genetic Predisposition to Disease , Genome-Wide Association Study , Privacy , Spondylitis, Ankylosing/genetics , Genotype , Humans
9.
BMC Med Res Methodol ; 23(1): 233, 2023 10 13.
Article in English | MEDLINE | ID: mdl-37833641

ABSTRACT

BACKGROUND: When data is distributed across multiple sites, sharing information at the individual level among sites may be difficult. In these multi-site studies, propensity score model can be fitted with data within each site or data from all sites when using inverse probability-weighted Cox regression to estimate overall hazard ratio. However, when there is unknown heterogeneity of covariates in different sites, either approach may lead to potential bias or reduced efficiency. In this study, we proposed a method to estimate propensity score based on covariate balance-related criterion and estimate the overall hazard ratio while overcoming data sharing constraints across sites. METHODS: The proposed propensity score was generated by choosing between global and local propensity score based on covariate balance-related criterion, combining the global propensity score fitted in the entire population and the local propensity score fitted within each site. We used this proposed propensity score to estimate overall hazard ratio of distributed survival data with multiple sites, while requiring only the summary-level information across sites. We conducted simulation studies to evaluate the performance of the proposed method. Besides, we applied the proposed method to real-world data to examine the effect of radiation therapy on time to death among breast cancer patients. RESULTS: The simulation studies showed that the proposed method improved the performance in estimating overall hazard ratio comparing with global and local propensity score method, regardless of the number of sites and sample size in each site. Similar results were observed under both homogeneous and heterogeneous settings. Besides, the proposed method yielded identical results to the pooled individual-level data analysis. The real-world data analysis indicated that the proposed method was more likely to find a significant effect of radiation therapy on mortality compared to the global propensity score method and local propensity score method. CONCLUSIONS: The proposed covariate balance-related propensity score in multi-site distributed survival data outperformed the global propensity score estimated using data from the entire population or the local propensity score estimated within each site in estimating the overall hazard ratio. The proposed approach can be performed without individual-level data transfer between sites and would yield the same results as the corresponding pooled individual-level data analysis.


Subject(s)
Information Dissemination , Humans , Propensity Score , Proportional Hazards Models , Computer Simulation , Information Dissemination/methods , Bias
10.
Neuroradiology ; 65(7): 1091-1099, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37160454

ABSTRACT

Commercial software based on artificial intelligence (AI) is entering clinical practice in neuroradiology. Consequently, medico-legal aspects of using Software as a Medical Device (SaMD) become increasingly important. These medico-legal issues warrant an interdisciplinary approach and may affect the way we work in daily practice. In this article, we seek to address three major topics: medical malpractice liability, regulation of AI-based medical devices, and privacy protection in shared medical imaging data, thereby focusing on the legal frameworks of the European Union and the USA. As many of the presented concepts are very complex and, in part, remain yet unsolved, this article is not meant to be comprehensive but rather thought-provoking. The goal is to engage clinical neuroradiologists in the debate and equip them to actively shape these topics in the future.


Subject(s)
Artificial Intelligence , Malpractice , Humans , Software , Radiologists
11.
J Med Internet Res ; 25: e46700, 2023 03 30.
Article in English | MEDLINE | ID: mdl-36995757

ABSTRACT

Brauneck and colleagues have combined technical and legal perspectives in their timely and valuable paper "Federated Machine Learning, Privacy-Enhancing Technologies, and Data Protection Laws in Medical Research: Scoping Review." Researchers who design mobile health (mHealth) systems must adopt the same privacy-by-design approach that privacy regulations (eg, General Data Protection Regulation) do. In order to do this successfully, we will have to overcome implementation challenges in privacy-enhancing technologies such as differential privacy. We will also have to pay close attention to emerging technologies such as private synthetic data generation.


Subject(s)
Biomedical Research , Telemedicine , Humans , Privacy , Computer Security , Machine Learning
12.
J Med Internet Res ; 25: e41588, 2023 03 30.
Article in English | MEDLINE | ID: mdl-36995759

ABSTRACT

BACKGROUND: The collection, storage, and analysis of large data sets are relevant in many sectors. Especially in the medical field, the processing of patient data promises great progress in personalized health care. However, it is strictly regulated, such as by the General Data Protection Regulation (GDPR). These regulations mandate strict data security and data protection and, thus, create major challenges for collecting and using large data sets. Technologies such as federated learning (FL), especially paired with differential privacy (DP) and secure multiparty computation (SMPC), aim to solve these challenges. OBJECTIVE: This scoping review aimed to summarize the current discussion on the legal questions and concerns related to FL systems in medical research. We were particularly interested in whether and to what extent FL applications and training processes are compliant with the GDPR data protection law and whether the use of the aforementioned privacy-enhancing technologies (DP and SMPC) affects this legal compliance. We placed special emphasis on the consequences for medical research and development. METHODS: We performed a scoping review according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). We reviewed articles on Beck-Online, SSRN, ScienceDirect, arXiv, and Google Scholar published in German or English between 2016 and 2022. We examined 4 questions: whether local and global models are "personal data" as per the GDPR; what the "roles" as defined by the GDPR of various parties in FL are; who controls the data at various stages of the training process; and how, if at all, the use of privacy-enhancing technologies affects these findings. RESULTS: We identified and summarized the findings of 56 relevant publications on FL. Local and likely also global models constitute personal data according to the GDPR. FL strengthens data protection but is still vulnerable to a number of attacks and the possibility of data leakage. These concerns can be successfully addressed through the privacy-enhancing technologies SMPC and DP. CONCLUSIONS: Combining FL with SMPC and DP is necessary to fulfill the legal data protection requirements (GDPR) in medical research dealing with personal data. Even though some technical and legal challenges remain, for example, the possibility of successful attacks on the system, combining FL with SMPC and DP creates enough security to satisfy the legal requirements of the GDPR. This combination thereby provides an attractive technical solution for health institutions willing to collaborate without exposing their data to risk. From a legal perspective, the combination provides enough built-in security measures to satisfy data protection requirements, and from a technical perspective, the combination provides secure systems with comparable performance with centralized machine learning applications.


Subject(s)
Biomedical Research , Privacy , Humans , Computer Security , Delivery of Health Care
13.
J Econom ; 235(2): 444-453, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37701878

ABSTRACT

Differential privacy is becoming one gold standard for protecting the privacy of publicly shared data. It has been widely used in social science, data science, public health, information technology, and the U.S. decennial census. Nevertheless, to guarantee differential privacy, existing methods may unavoidably alter the conclusion of original data analysis, as privatization often changes the sample distribution. This phenomenon is known as the trade-off between privacy protection and statistical accuracy. In this work, we mitigate this trade-off by developing a distribution-invariant privatization (DIP) method to reconcile both high statistical accuracy and strict differential privacy. As a result, any downstream statistical or machine learning task yields essentially the same conclusion as if one used the original data. Numerically, under the same strictness of privacy protection, DIP achieves superior statistical accuracy in a wide range of simulation studies and real-world benchmarks.

14.
Sensors (Basel) ; 23(8)2023 Apr 16.
Article in English | MEDLINE | ID: mdl-37112370

ABSTRACT

With the rapid development of the Internet of Things (IoT) technology, Wi-Fi signals have been widely used for trajectory signal acquisition. Indoor trajectory matching aims to achieve the monitoring of the encounters between people and trajectory analysis in indoor environments. Due to constraints ofn the computation abilities IoT devices, the computation of indoor trajectory matching requires the assistance of a cloud platform, which brings up privacy concerns. Therefore, this paper proposes a trajectory-matching calculation method that supports ciphertext operations. Hash algorithms and homomorphic encryption are selected to ensure the security of different private data, and the actual trajectory similarity is determined based on correlation coefficients. However, due to obstacles and other interferences in indoor environments, the original data collected may be missing in certain stages. Therefore, this paper also complements the missing values on ciphertexts through mean, linear regression, and KNN algorithms. These algorithms can predict the missing parts of the ciphertext dataset, and the accuracy of the complemented dataset can reach over 97%. This paper provides original and complemented datasets for matching calculations, and demonstrates their high feasibility and effectiveness in practical applications from the perspective of calculation time and accuracy loss.

15.
Sensors (Basel) ; 23(4)2023 Feb 16.
Article in English | MEDLINE | ID: mdl-36850842

ABSTRACT

Location-based application services and location privacy protection solutions are often required for the storage, management, and efficient retrieval of large amounts of geolocation data for specific locations or location intervals. We design a hierarchical tree-like organization structure, GL-Tree, which enables the storage, management, and retrieval of massive location data and satisfies the user's location-hiding requirements. We first use Geohash encoding to convert the two-dimensional geospatial coordinates of locations into one-dimensional strings and construct the GL-Tree based on the Geohash encoding principle. We gradually reduce the location intervals by extending the length of the Geohash code to achieve geospatial grid division and spatial approximation of user locations. The hierarchical tree structure of GL-Tree reflects the correspondence between Geohash codes and geographic intervals. Users and their location relationships are recorded in the leaf nodes at each level of the hierarchical GL-Tree. In top-down order, along the GL-Tree, efficient storage and retrieval of location sets for specified locations and specified intervals can be achieved. We conducted experimental tests on the Gowalla public dataset and compared the performance of the B+ tree, R tree, and GL-Tree in terms of time consumption in three aspects: tree construction, location insertion, and location retrieval, and the results show that GL-Tree has good performance in terms of time consumption.

16.
Sensors (Basel) ; 23(11)2023 May 31.
Article in English | MEDLINE | ID: mdl-37299946

ABSTRACT

Location-based services (LBS) are widely used due to the rapid development of mobile devices and location technology. Users usually provide precise location information to LBS to access the corresponding services. However, this convenience comes with the risk of location privacy disclosure, which can infringe upon personal privacy and security. In this paper, a location privacy protection method based on differential privacy is proposed, which efficiently protects users' locations, without degrading the performance of LBS. First, a location-clustering (L-clustering) algorithm is proposed to divide the continuous locations into different clusters based on the distance and density relationships among multiple groups. Then, a differential privacy-based location privacy protection algorithm (DPLPA) is proposed to protect users' location privacy, where Laplace noise is added to the resident points and centroids within the cluster. The experimental results show that the DPLPA achieves a high level of data utility, with minimal time consumption, while effectively protecting the privacy of location information.


Subject(s)
Privacy , Technology , Algorithms , Computers, Handheld , Cluster Analysis , Computer Security
17.
Sensors (Basel) ; 23(3)2023 Jan 19.
Article in English | MEDLINE | ID: mdl-36772204

ABSTRACT

In VANETs, owing to the openness of wireless communication, it is necessary to change pseudonyms frequently to realize the unlinkability of vehicle identity. Moreover, identity authentication is needed, which is usually completed by digital certificates or a trusted third party. The storage and the communication overhead are high. This paper proposes a triple pseudonym authentication scheme for VANETs based on the Cuckoo Filter and Paillier homomorphic encryption (called TriNymAuth). TriNymAuth applies Paillier homomorphic encryption, a Cuckoo Filter combining filter-level and bucket-level, and a triple pseudonym (homomorphic pseudonym, local pseudonym, and virtual pseudonym) authentication to the vehicle identity authentication scheme. It reduces the dependence on a trusted third party and ensures the privacy and security of vehicle identity while improving authentication efficiency. Experimental results show that the insert overhead of the Cuckoo Filter is about 10 µs, and the query overhead reaches the ns level. Furthermore, TriNymAuth has significant cost advantages, with an OBU enrollment cost of only 0.884 ms. When the data rate in VANETs dr≤ 180 kbps, TriNymAuth has the smallest total transmission delay cost and is suitable for shopping malls and other places with dense traffic.

18.
Sensors (Basel) ; 23(3)2023 Jan 31.
Article in English | MEDLINE | ID: mdl-36772594

ABSTRACT

Currently, a significant focus has been established on the privacy protection of multi-dimensional data publishing in various application scenarios, such as scientific research and policy-making. The K-anonymity mechanism based on clustering is the main method of shared-data desensitization, but it will cause problems of inconsistent clustering results and low clustering accuracy. It also cannot defend against several common attacks, such as skewness and similarity attacks at the same time. To defend against these attacks, we propose a K-anonymity privacy protection algorithm for multi-dimensional data against skewness and similarity attacks (KAPP) combined with t-closeness. Firstly, we propose a multi-dimensional sensitive data clustering algorithm based on improved African vultures optimization. More specifically, we improve the initialization, fitness calculation, and solution update strategy of the clustering center. The improved African vultures optimization can provide the optimal solution with various dimensions and achieve highly accurate clustering of the multi-dimensional dataset based on multiple sensitive attributes. It ensures that multi-dimensional data of different clusters are different in sensitive data. After the dataset anonymization, similar sensitive data of the same equivalence class will become less, and it eventually does not satisfy the premise of being theft by skewness and similarity attacks. We also propose an equivalence class partition method based on the sensitive data distribution difference value measurement and t-closeness. Namely, we calculate the sensitive data distribution's difference value of each equivalence class and then combine the equivalence classes with larger difference values. Each equivalence class satisfies t-closeness. This method can ensure that multi-dimensional data of the same equivalence class are different in multiple sensitive attributes, and thus can effectively defend against skewness and similarity attacks. Moreover, we generalize sensitive attributes with significant weight and all quasi-identifier attributes to achieve anonymous protection of the dataset. The experimental results show that KAPP improves clustering accuracy, diversity, and anonymity compared to other similar methods under skewness and similarity attacks.

19.
Sensors (Basel) ; 23(4)2023 Feb 13.
Article in English | MEDLINE | ID: mdl-36850701

ABSTRACT

Blockchain introduces challenges related to the reliability of user identity and identity management systems; this includes detecting unfalsified identities linked to IoT applications. This study focuses on optimizing user identity verification time by employing an efficient encryption algorithm for the user signature in a peer-to-peer decentralized IoT blockchain network. To achieve this, a user signature-based identity management framework is examined by using various encryption techniques and contrasting various hash functions built on top of the Modified Merkle Hash Tree (MMHT) data structure algorithm. The paper presents the execution of varying dataset sizes based on transactions between nodes to test the scalability of the proposed design for secure blockchain communication. The results show that the MMHT data structure algorithm using SHA3 and AES-128 encryption algorithm gives the lowest execution time, offering a minimum of 36% gain in time optimization compared to other algorithms. This work shows that using the AES-128 encryption algorithm with the MMHT algorithm and SHA3 hash function not only identifies malicious codes but also improves user integrity check performance in a blockchain network, while ensuring network scalability. Therefore, this study presents the performance evaluation of a blockchain network considering its distinct types, properties, components, and algorithms' taxonomy.

20.
Entropy (Basel) ; 25(8)2023 Jul 27.
Article in English | MEDLINE | ID: mdl-37628155

ABSTRACT

Federated learning is a distributed machine learning framework, which allows users to save data locally for training without sharing data. Users send the trained local model to the server for aggregation. However, untrusted servers may infer users' private information from the provided data and mistakenly execute aggregation protocols to forge aggregation results. In order to ensure the reliability of the federated learning scheme, we must protect the privacy of users' information and ensure the integrity of the aggregation results. This paper proposes an effective secure aggregation verifiable federated learning scheme, which has both high communication efficiency and privacy protection function. The scheme encrypts the gradients with a single mask technology to securely aggregate gradients, thus ensuring that malicious servers cannot deduce users' private information from the provided data. Then the masked gradients are hashed to verify the aggregation results. The experimental results show that our protocol is more suited for bandwidth-constraint and offline-users scenarios.

SELECTION OF CITATIONS
SEARCH DETAIL