Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 264
Filtrar
Más filtros

Base de datos
Tipo del documento
Intervalo de año de publicación
1.
Int J Med Inform ; 191: 105582, 2024 Jul 31.
Artículo en Inglés | MEDLINE | ID: mdl-39096591

RESUMEN

OBJECTIVE: To describe the use of privacy preserving linkage methods operationally in Australia, and to present insights and key learnings from their implementation. METHODS: Privacy preserving record linkage (PPRL) utilising Bloom filters provides a unique practical mechanism that allows linkage to occur without the release of personally identifiable information (PII), while still ensuring high accuracy. RESULTS: The methodology has received wide uptake within Australia, with four state linkage units with privacy preserving capability. It has enabled access to general practice and private pathology data amongst other, both much sought after datasets previous inaccessible for linkage. CONCLUSION: The Australian experience suggests privacy preserving linkage is a practical solution for improving data access for policy, planning and population health research. It is hoped interest in this methodology internationally continues to grow.

2.
Neural Netw ; 179: 106574, 2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39096754

RESUMEN

Graph neural networks (GNN) are widely used in recommendation systems, but traditional centralized methods raise privacy concerns. To address this, we introduce a federated framework for privacy-preserving GNN-based recommendations. This framework allows distributed training of GNN models using local user data. Each client trains a GNN using its own user-item graph and uploads gradients to a central server for aggregation. To overcome limited data, we propose expanding local graphs using Software Guard Extension (SGX) and Local Differential Privacy (LDP). SGX computes node intersections for subgraph exchange and expansion, while local differential privacy ensures privacy. Additionally, we introduce a personalized approach with Prototype Networks (PN) and Model-Agnostic Meta-Learning (MAML) to handle data heterogeneity. This enhances the encoding abilities of the federated meta-learner, enabling precise fine-tuning and quick adaptation to diverse client graph data. We leverage SGX and local differential privacy for secure parameter sharing and defense against malicious servers. Comprehensive experiments across six datasets demonstrate our method's superiority over centralized GNN-based recommendations, while preserving user privacy.

3.
Artículo en Inglés | MEDLINE | ID: mdl-39047294

RESUMEN

OBJECTIVES: To understand the landscape of privacy preserving record linkage (PPRL) applications in public health, assess estimates of PPRL accuracy and privacy, and evaluate factors for PPRL adoption. MATERIALS AND METHODS: A literature scan examined the accuracy, data privacy, and scalability of PPRL in public health. Twelve interviews with subject matter experts were conducted and coded using an inductive approach to identify factors related to PPRL adoption. RESULTS: PPRL has a high level of linkage quality and accuracy. PPRL linkage quality was comparable to that of clear text linkage methods (requiring direct personally identifiable information [PII]) for linkage across various settings and research questions. Accuracy of PPRL depended on several components, such as PPRL technique, and the proportion of missingness and errors in underlying data. Strategies to increase adoption include increasing understanding of PPRL, improving data owner buy-in, establishing governance structure and oversight, and developing a public health implementation strategy for PPRL. DISCUSSION: PPRL protects privacy by eliminating the need to share PII for linkage, but the accuracy and linkage quality depend on factors including the choice of PPRL technique and specific PII used to create encrypted identifiers. Large-scale implementations of PPRL linking millions of observations-including PCORnet, National Institutes for Health N3C, and the Centers for Disease Control and Prevention COVID-19 project have demonstrated the scalability of PPRL for public health applications. CONCLUSIONS: Applications of PPRL in public health have demonstrated their value for the public health community. Although gaps must be addressed before wide implementation, PPRL is a promising solution to data linkage challenges faced by the public health ecosystem.

4.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39073827

RESUMEN

Genome-wide association studies (GWAS) serve as a crucial tool for identifying genetic factors associated with specific traits. However, ethical constraints prevent the direct exchange of genetic information, prompting the need for privacy preservation solutions. To address these issues, earlier works are based on cryptographic mechanisms such as homomorphic encryption, secure multi-party computing, and differential privacy. Very recently, federated learning has emerged as a promising solution for enabling secure and collaborative GWAS computations. This work provides an extensive overview of existing methods for GWAS privacy preserving, with the main focus on collaborative and distributed approaches. This survey provides a comprehensive analysis of the challenges faced by existing methods, their limitations, and insights into designing efficient solutions.


Asunto(s)
Privacidad Genética , Estudio de Asociación del Genoma Completo , Estudio de Asociación del Genoma Completo/métodos , Humanos , Genómica/métodos , Seguridad Computacional
5.
Sensors (Basel) ; 24(14)2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-39065842

RESUMEN

This paper presents an on-device semi-supervised human activity detection system that can learn and predict human activity patterns in real time. The clinical objective is to monitor and detect the unhealthy sedentary lifestyle of a user. The proposed semi-supervised learning (SSL) framework uses sparsely labelled user activity events acquired from Inertial Measurement Unit sensors installed as wearable devices. The proposed cluster-based learning model in this approach is trained with data from the same target user, thus preserving data privacy while providing personalized activity detection services. Two different cluster labelling strategies, namely, population-based and distance-based strategies, are employed to achieve the desired classification performance. The proposed system is shown to be highly accurate and computationally efficient for different algorithmic parameters, which is relevant in the context of limited computing resources on typical wearable devices. Extensive experimentation and simulation study have been conducted on multi-user human activity data from the public domain in order to analyze the trade-off between classification accuracy and computation complexity of the proposed learning paradigm with different algorithmic hyper-parameters. With 4.17 h of training time for 8000 activity episodes, the proposed SSL approach consumes at most 20 KB of CPU memory space, while providing a maximum accuracy of 90% and 100% classification rates.


Asunto(s)
Algoritmos , Dispositivos Electrónicos Vestibles , Humanos , Monitoreo Fisiológico/métodos , Monitoreo Fisiológico/instrumentación , Privacidad , Aprendizaje Automático Supervisado , Actividades Humanas , Medicina de Precisión/métodos
6.
Sci Rep ; 14(1): 15589, 2024 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-38971879

RESUMEN

Federated learning (FL) has emerged as a significant method for developing machine learning models across multiple devices without centralized data collection. Candidemia, a critical but rare disease in ICUs, poses challenges in early detection and treatment. The goal of this study is to develop a privacy-preserving federated learning framework for predicting candidemia in ICU patients. This approach aims to enhance the accuracy of antifungal drug prescriptions and patient outcomes. This study involved the creation of four predictive FL models for candidemia using data from ICU patients across three hospitals in China. The models were designed to prioritize patient privacy while aggregating learnings across different sites. A unique ensemble feature selection strategy was implemented, combining the strengths of XGBoost's feature importance and statistical test p values. This strategy aimed to optimize the selection of relevant features for accurate predictions. The federated learning models demonstrated significant improvements over locally trained models, with a 9% increase in the area under the curve (AUC) and a 24% rise in true positive ratio (TPR). Notably, the FL models excelled in the combined TPR + TNR metric, which is critical for feature selection in candidemia prediction. The ensemble feature selection method proved more efficient than previous approaches, achieving comparable performance. The study successfully developed a set of federated learning models that significantly enhance the prediction of candidemia in ICU patients. By leveraging a novel feature selection method and maintaining patient privacy, the models provide a robust framework for improved clinical decision-making in the treatment of candidemia.


Asunto(s)
Candidemia , Unidades de Cuidados Intensivos , Aprendizaje Automático , Humanos , Candidemia/tratamiento farmacológico , Candidemia/diagnóstico , Antifúngicos/uso terapéutico , China , Masculino , Femenino , Atención a la Salud
7.
IEEE Trans Inf Forensics Secur ; 19: 5751-5766, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38993695

RESUMEN

Conducting secure computations to protect against malicious adversaries is an emerging field of research. Current models designed for malicious security typically necessitate the involvement of two or more servers in an honest-majority setting. Among privacy-preserving data mining techniques, significant attention has been focused on the classification problem. Logistic regression emerges as a well-established classification model, renowned for its impressive performance. We introduce a novel matrix encryption method to build a maliciously secure logistic model. Our scheme involves only a single semi-honest server and is resilient to malicious data providers that may deviate arbitrarily from the scheme. The d -transformation ensures that our scheme achieves indistinguishability (i.e., no adversary can determine, in polynomial time, which of the plaintexts corresponds to a given ciphertext in a chosen-plaintext attack). Malicious activities of data providers can be detected in the verification stage. A lossy compression method is implemented to minimize communication costs while preserving negligible degradation in accuracy. Experiments illustrate that our scheme is highly efficient to analyze large-scale datasets and achieves accuracy similar to non-private models. The proposed scheme outperforms other maliciously secure frameworks in terms of computation and communication costs.

8.
Front Cardiovasc Med ; 11: 1399138, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39036502

RESUMEN

Background: Federated learning (FL) is a technique for learning prediction models without sharing records between hospitals. Compared to centralized training approaches, the adoption of FL could negatively impact model performance. Aim: This study aimed to evaluate four types of multicenter model development strategies for predicting 30-day mortality for patients undergoing transcatheter aortic valve implantation (TAVI): (1) central, learning one model from a centralized dataset of all hospitals; (2) local, learning one model per hospital; (3) federated averaging (FedAvg), averaging of local model coefficients; and (4) ensemble, aggregating local model predictions. Methods: Data from all 16 Dutch TAVI hospitals from 2013 to 2021 in the Netherlands Heart Registration (NHR) were used. All approaches were internally validated. For the central and federated approaches, external geographic validation was also performed. Predictive performance in terms of discrimination [the area under the ROC curve (AUC-ROC, hereafter referred to as AUC)] and calibration (intercept and slope, and calibration graph) was measured. Results: The dataset comprised 16,661 TAVI records with a 30-day mortality rate of 3.4%. In internal validation the AUCs of central, local, FedAvg, and ensemble models were 0.68, 0.65, 0.67, and 0.67, respectively. The central and local models were miscalibrated by slope, while the FedAvg and ensemble models were miscalibrated by intercept. During external geographic validation, central, FedAvg, and ensemble all achieved a mean AUC of 0.68. Miscalibration was observed for the central, FedAvg, and ensemble models in 44%, 44%, and 38% of the hospitals, respectively. Conclusion: Compared to centralized training approaches, FL techniques such as FedAvg and ensemble demonstrated comparable AUC and calibration. The use of FL techniques should be considered a viable option for clinical prediction model development.

9.
Neural Netw ; 178: 106436, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38908165

RESUMEN

Incremental learning algorithms have been developed as an efficient solution for fast remodeling in Broad Learning Systems (BLS) without a retraining process. Even though the structure and performance of broad learning are gradually showing superiority, private data leakage in broad learning systems is still a problem that needs to be solved. Recently, Multiparty Secure Broad Learning System (MSBLS) is proposed to allow two clients to participate training. However, privacy-preserving broad learning across multiple clients has received limited attention. In this paper, we propose a Self-Balancing Incremental Broad Learning System (SIBLS) with privacy protection by considering the effect of different data sample sizes from clients, which allows multiple clients to be involved in the incremental learning. Specifically, we design a client selection strategy to select two clients in each round by reducing the gap in the number of data samples in the incremental updating process. To ensure the security under the participation of multiple clients, we introduce a mediator in the data encryption and feature mapping process. Three classical datasets are used to validate the effectiveness of our proposed SIBLS, including MNIST, Fashion and NORB datasets. Experimental results show that our proposed SIBLS can have comparable performance with MSBLS while achieving better performance than federated learning in terms of accuracy and running time.


Asunto(s)
Seguridad Computacional , Privacidad , Algoritmos , Humanos , Aprendizaje Automático , Redes Neurales de la Computación
10.
Artículo en Inglés | MEDLINE | ID: mdl-38873338

RESUMEN

Chest X-rays (CXRs) play a pivotal role in cost-effective clinical assessment of various heart and lung related conditions. The urgency of COVID-19 diagnosis prompted their use in identifying conditions like lung opacity, pneumonia, and acute respiratory distress syndrome in pediatric patients. We propose an AI-driven solution for binary COVID-19 versus non-COVID-19 classification in pediatric CXRs. We present a Federated Self-Supervised Learning (FSSL) framework to enhance Vision Transformer (ViT) performance for COVID-19 detection in pediatric CXRs. ViT's prowess in vision-related binary classification tasks, combined with self-supervised pre-training on adult CXR data, forms the basis of the FSSL approach. We implement our strategy on the Rhino Health Federated Computing Platform (FCP), which ensures privacy and scalability for distributed data. The chest X-ray analysis using the federated SSL (CAFES) model, utilizes the FSSL-pre-trained ViT weights and demonstrated gains in accurately detecting COVID-19 when compared with a fully supervised model. Our FSSL-pre-trained ViT showed an area under the precision-recall curve (AUPR) of 0.952, which is 0.231 points higher than the fully supervised model for COVID-19 diagnosis using pediatric data. Our contributions include leveraging vision transformers for effective COVID-19 diagnosis from pediatric CXRs, employing distributed federated learning-based self-supervised pre-training on adult data, and improving pediatric COVID-19 diagnosis performance. This privacy-conscious approach aligns with HIPAA guidelines, paving the way for broader medical imaging applications.

11.
Entropy (Basel) ; 26(6)2024 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-38920488

RESUMEN

In light of growing concerns about the misuse of personal data resulting from the widespread use of artificial intelligence technology, it is necessary to implement robust privacy-protection methods. However, existing methods for protecting facial privacy suffer from issues such as poor visual quality, distortion and limited reusability. To tackle this challenge, we propose a novel approach called Diffusion Models for Face Privacy Protection (DIFP). Our method utilizes a face generator that is conditionally controlled and reality-guided to produce high-resolution encrypted faces that are photorealistic while preserving the naturalness and recoverability of the original facial information. We employ a two-stage training strategy to generate protected faces with guidance on identity and style, followed by an iterative technique for improving latent variables to enhance realism. Additionally, we introduce diffusion model denoising for identity recovery, which facilitates the removal of encryption and restoration of the original face when required. Experimental results demonstrate the effectiveness of our method in qualitative privacy protection, achieving high success rates in evading face-recognition tools and enabling near-perfect restoration of occluded faces.

12.
BMC Med Inform Decis Mak ; 24(1): 170, 2024 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-38886772

RESUMEN

BACKGROUND: Artificial intelligence (AI) has become a pivotal tool in advancing contemporary personalised medicine, with the goal of tailoring treatments to individual patient conditions. This has heightened the demand for access to diverse data from clinical practice and daily life for research, posing challenges due to the sensitive nature of medical information, including genetics and health conditions. Regulations like the Health Insurance Portability and Accountability Act (HIPAA) in the U.S. and the General Data Protection Regulation (GDPR) in Europe aim to strike a balance between data security, privacy, and the imperative for access. RESULTS: We present the Gemelli Generator - Real World Data (GEN-RWD) Sandbox, a modular multi-agent platform designed for distributed analytics in healthcare. Its primary objective is to empower external researchers to leverage hospital data while upholding privacy and ownership, obviating the need for direct data sharing. Docker compatibility adds an extra layer of flexibility, and scalability is assured through modular design, facilitating combinations of Proxy and Processor modules with various graphical interfaces. Security and reliability are reinforced through components like Identity and Access Management (IAM) agent, and a Blockchain-based notarisation module. Certification processes verify the identities of information senders and receivers. CONCLUSIONS: The GEN-RWD Sandbox architecture achieves a good level of usability while ensuring a blend of flexibility, scalability, and security. Featuring a user-friendly graphical interface catering to diverse technical expertise, its external accessibility enables personnel outside the hospital to use the platform. Overall, the GEN-RWD Sandbox emerges as a comprehensive solution for healthcare distributed analytics, maintaining a delicate equilibrium between accessibility, scalability, and security.


Asunto(s)
Seguridad Computacional , Confidencialidad , Humanos , Seguridad Computacional/normas , Confidencialidad/normas , Inteligencia Artificial , Hospitales
13.
Sensors (Basel) ; 24(10)2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38793843

RESUMEN

Edge computing provides higher computational power and lower transmission latency by offloading tasks to nearby edge nodes with available computational resources to meet the requirements of time-sensitive tasks and computationally complex tasks. Resource allocation schemes are essential to this process. To allocate resources effectively, it is necessary to attach metadata to a task to indicate what kind of resources are needed and how many computation resources are required. However, these metadata are sensitive and can be exposed to eavesdroppers, which can lead to privacy breaches. In addition, edge nodes are vulnerable to corruption because of their limited cybersecurity defenses. Attackers can easily obtain end-device privacy through unprotected metadata or corrupted edge nodes. To address this problem, we propose a metadata privacy resource allocation scheme that uses searchable encryption to protect metadata privacy and zero-knowledge proofs to resist semi-malicious edge nodes. We have formally proven that our proposed scheme satisfies the required security concepts and experimentally demonstrated the effectiveness of the scheme.

14.
Sensors (Basel) ; 24(10)2024 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-38793906

RESUMEN

Smartwatch health sensor data are increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprise sensitive personal information and are resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress, employing Generative Adversarial Networks (GANs) and Differential Privacy (DP) safeguards. Our method not only protects patient information but also enhances data availability for research. To ensure its usefulness, we test synthetic data from multiple GANs and employ different data enhancement strategies on an actual stress detection task. Our GAN-based augmentation methods demonstrate significant improvements in model performance, with private DP training scenarios observing an 11.90-15.48% increase in F1-score, while non-private training scenarios still see a 0.45% boost. These results underline the potential of differentially private synthetic data in optimizing utility-privacy trade-offs, especially with the limited availability of real training samples. Through rigorous quality assessments, we confirm the integrity and plausibility of our synthetic data, which, however, are significantly impacted when increasing privacy requirements.


Asunto(s)
Privacidad , Dispositivos Electrónicos Vestibles , Humanos , Monitoreo Fisiológico/métodos , Monitoreo Fisiológico/instrumentación , Algoritmos
15.
Sensors (Basel) ; 24(10)2024 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-38794019

RESUMEN

Differential privacy has emerged as a practical technique for privacy-preserving deep learning. However, recent studies on privacy attacks have demonstrated vulnerabilities in the existing differential privacy implementations for deep models. While encryption-based methods offer robust security, their computational overheads are often prohibitive. To address these challenges, we propose a novel differential privacy-based image generation method. Our approach employs two distinct noise types: one makes the image unrecognizable to humans, preserving privacy during transmission, while the other maintains features essential for machine learning analysis. This allows the deep learning service to provide accurate results, without compromising data privacy. We demonstrate the feasibility of our method on the CIFAR100 dataset, which offers a realistic complexity for evaluation.

16.
Entropy (Basel) ; 26(5)2024 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-38785602

RESUMEN

In the realm of federated learning (FL), the exchange of model data may inadvertently expose sensitive information of participants, leading to significant privacy concerns. Existing FL privacy-preserving techniques, such as differential privacy (DP) and secure multi-party computing (SMC), though offering viable solutions, face practical challenges including reduced performance and complex implementations. To overcome these hurdles, we propose a novel and pragmatic approach to privacy preservation in FL by employing localized federated updates (LF3PFL) aimed at enhancing the protection of participant data. Furthermore, this research refines the approach by incorporating cross-entropy optimization, carefully fine-tuning measurement, and improving information loss during the model training phase to enhance both model efficacy and data confidentiality. Our approach is theoretically supported and empirically validated through extensive simulations on three public datasets: CIFAR-10, Shakespeare, and MNIST. We evaluate its effectiveness by comparing training accuracy and privacy protection against state-of-the-art techniques. Our experiments, which involve five distinct local models (Simple-CNN, ModerateCNN, Lenet, VGG9, and Resnet18), provide a comprehensive assessment across a variety of scenarios. The results clearly demonstrate that LF3PFL not only maintains competitive training accuracies but also significantly improves privacy preservation, surpassing existing methods in practical applications. This balance between privacy and performance underscores the potential of localized federated updates as a key component in future FL privacy strategies, offering a scalable and effective solution to one of the most pressing challenges in FL.

17.
PeerJ Comput Sci ; 10: e1932, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38660199

RESUMEN

Data aggregation plays a critical role in sensor networks for efficient data collection. However, the assumption of uniform initial energy levels among sensors in existing algorithms is unrealistic in practical production applications. This discrepancy in initial energy levels significantly impacts data aggregation in sensor networks. To address this issue, we propose Data Aggregation with Different Initial Energy (DADIE), a novel algorithm that aims to enhance energy-saving, privacy-preserving efficiency, and reduce node death rates in sensor networks with varying initial energy nodes. DADIE considers the transmission distance between nodes and their initial energy levels when forming the network topology, while also limiting the number of child nodes. Furthermore, DADIE reconstructs the aggregation tree before each round of data transmission. This allows nodes closer to the receiving end with higher initial energy to undertake more data aggregation and transmission tasks while limiting energy consumption. As a result, DADIE effectively reduces the node death rate and improves the efficiency of data transmission throughout the network. To enhance network security, DADIE establishes secure transmission channels between transmission nodes prior to data transmission, and it employs slice-and-mix technology within the network. Our experimental simulations demonstrate that the proposed DADIE algorithm effectively resolves the data aggregation challenges in sensor networks with varying initial energy nodes. It achieves 5-20% lower communication overhead and energy consumption, 10-20% higher security, and 10-30% lower node mortality than existing algorithms.

18.
Bioengineering (Basel) ; 11(4)2024 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-38671762

RESUMEN

Malaria is one of the life-threatening diseases caused by the parasite known as Plasmodium falciparum, affecting the human red blood cells. Therefore, it is an important to have an effective computer-aided system in place for early detection and treatment. The visual heterogeneity of the malaria dataset is highly complex and dynamic, therefore higher number of images are needed to train the machine learning (ML) models effectively. However, hospitals as well as medical institutions do not share the medical image data for collaboration due to general data protection regulations (GDPR) and the data protection act (DPA). To overcome this collaborative challenge, our research utilised real-time medical image data in the framework of federated learning (FL). We have used state-of-the-art ML models that include the ResNet-50 and DenseNet in a federated learning framework. We have experimented both models in different settings on a malaria dataset constituting 27,560 publicly available images and our preliminary results showed that the DenseNet model performed better in accuracy (75%) in contrast to ResNet-50 (72%) while considering eight clients, while the trend was observed as common in four clients with the similar accuracy of 94%, and six clients showed that the DenseNet model performed quite well with the accuracy of 92%, while ResNet-50 achieved only 72%. The federated learning framework enhances the accuracy due to its decentralised nature, continuous learning, and effective communication among clients, as well as the efficient local adaptation. The use of federated learning architecture among the distinct clients for ensuring the data privacy and following GDPR is the contribution of this research work.

19.
Comput Biol Med ; 173: 108351, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38520921

RESUMEN

Single-cell transcriptomics data provides crucial insights into patients' health, yet poses significant privacy concerns. Genomic data privacy attacks can have deep implications, encompassing not only the patients' health information but also extending widely to compromise their families'. Moreover, the permanence of leaked data exacerbates the challenges, making retraction an impossibility. While extensive efforts have been directed towards clustering single-cell transcriptomics data, addressing critical challenges, especially in the realm of privacy, remains pivotal. This paper introduces an efficient, fast, privacy-preserving approach for clustering single-cell RNA-sequencing (scRNA-seq) datasets. The key contributions include ensuring data privacy, achieving high-quality clustering, accommodating the high dimensionality inherent in the datasets, and maintaining reasonable computation time for big-scale datasets. Our proposed approach utilizes the map-reduce scheme to parallelize clustering, addressing intensive calculation challenges. Intel Software Guard eXtension (SGX) processors are used to ensure the security of sensitive code and data during processing. Additionally, the approach incorporates a logarithm transformation as a preprocessing step, employs non-negative matrix factorization for dimensionality reduction, and utilizes parallel k-means for clustering. The approach fully leverages the computing capabilities of all processing resources within a secure private cloud environment. Experimental results demonstrate the efficacy of our approach in preserving patient privacy while surpassing state-of-the-art methods in both clustering quality and computation time. Our method consistently achieves a minimum of 7% higher Adjusted Rand Index (ARI) than existing approaches, contingent on dataset size. Additionally, due to parallel computations and dimensionality reduction, our approach exhibits efficiency, converging to very good results in less than 10 seconds for a scRNA-seq dataset with 5000 genes and 6000 cells when prioritizing privacy and under two seconds without privacy considerations. Availability and implementation Code and datasets availability: https://github.com/University-of-Windsor/PPPCT.


Asunto(s)
Privacidad , Programas Informáticos , Humanos , Algoritmos , Perfilación de la Expresión Génica , Análisis por Conglomerados , Análisis de Secuencia de ARN
20.
J Comput Biol ; 31(3): 197-212, 2024 03.
Artículo en Inglés | MEDLINE | ID: mdl-38531050

RESUMEN

Finding highly similar regions of genomic sequences is a basic computation of genomic analysis. Genomic analyses on a large amount of data are efficiently processed in cloud environments, but outsourcing them to a cloud raises concerns over the privacy and security issues. Homomorphic encryption (HE) is a powerful cryptographic primitive that preserves privacy of genomic data in various analyses processed in an untrusted cloud environment. We introduce an efficient algorithm for finding highly similar regions of two homomorphically encrypted sequences, and describe how to implement it using the bit-wise and word-wise HE schemes. In the experiment, our algorithm outperforms an existing algorithm by up to two orders of magnitude in terms of elapsed time. Overall, it finds highly similar regions of the sequences in real data sets in a feasible time.


Asunto(s)
Seguridad Computacional , Genómica , Algoritmos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA