Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
Más filtros

Banco de datos
Tipo del documento
Asunto de la revista
Intervalo de año de publicación
1.
Bioinformatics ; 39(39 Suppl 1): i168-i176, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387172

RESUMEN

The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification: identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ancestry is principal component analysis (PCA). In this article, we propose a privacy-preserving framework which utilizes PCA to assign individuals to populations across multiple collaborators as part of the population stratification step. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then aligns the local PCA results to identify the genetic differences among collaborators' datasets. Our results on real genomic data show that the proposed framework can perform population stratification analysis with high accuracy while preserving the privacy of the research participants.


Asunto(s)
Genómica , Privacidad , Humanos , Mapeo Cromosómico , Metadatos , Análisis de Componente Principal
2.
Bioinformatics ; 39(10)2023 10 03.
Artículo en Inglés | MEDLINE | ID: mdl-37856329

RESUMEN

MOTIVATION: Genome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques provide formal secure guarantees, the substantial communication and computational overheads hinder the practical application of large-scale collaborative GWAS. RESULTS: This work introduces an efficient framework for conducting collaborative GWAS on distributed datasets, maintaining data privacy without compromising the accuracy of the results. We propose a novel two-step strategy aimed at reducing communication and computational overheads, and we employ iterative and sampling techniques to ensure accurate results. We instantiate our approach using logistic regression, a commonly used statistical method for identifying associations between genetic markers and the phenotype of interest. We evaluate our proposed methods using two real genomic datasets and demonstrate their robustness in the presence of between-study heterogeneity and skewed phenotype distributions using a variety of experimental settings. The empirical results show the efficiency and applicability of the proposed method and the promise for its application for large-scale collaborative GWAS. AVAILABILITY AND IMPLEMENTATION: The source code and data are available at https://github.com/amioamo/TDS.


Asunto(s)
Estudio de Asociación del Genoma Completo , Privacidad , Humanos , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Confidencialidad , Programas Informáticos
3.
IEEE Trans Knowl Data Eng ; 35(12): 12264-12281, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37974954

RESUMEN

Identifying anomalies in data is vital in many domains, including medicine, finance, and national security. However, privacy concerns pose a significant roadblock to carrying out such an analysis. Since existing privacy definitions do not allow good accuracy when doing outlier analysis, the notion of sensitive privacy has been recently proposed to deal with this problem. Sensitive privacy makes it possible to analyze data for anomalies with practically meaningful accuracy while providing a strong guarantee similar to differential privacy, which is the prevalent privacy standard today. In this work, we relate sensitive privacy to other important notions of data privacy so that one can port the technical developments and private mechanism constructions from these related concepts to sensitive privacy. Sensitive privacy critically depends on the underlying anomaly model. We develop a novel n-step lookahead mechanism to efficiently answer arbitrary outlier queries, which provably guarantees sensitive privacy if we restrict our attention to common a class of anomaly models. We also provide general constructions to give sensitively private mechanisms for identifying anomalies and show the conditions under which the constructions would be optimal.

4.
IEEE Trans Knowl Data Eng ; 35(1): 1-15, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36506788

RESUMEN

Property preserving encryption techniques have significantly advanced the utility of encrypted data in various data outsourcing settings (e.g., the cloud). However, while preserving certain properties (e.g., the prefixes or order of the data) in the encrypted data, such encryption schemes are typically limited to specific data types (e.g., prefix-preserved IP addresses) or applications (e.g., range queries over order-preserved data), and highly vulnerable to the emerging inference attacks which may greatly limit their applications in practice. In this paper, to the best of our knowledge, we make the first attempt to generalize the prefix preserving encryption via prefix-aware encoding that is not only applicable to more general data types (e.g., geo-locations, market basket data, DNA sequences, numerical data and timestamps) but also secure against the inference attacks. Furthermore, we present a generalized multi-view outsourcing framework that generates multiple indistinguishable data views in which one view fully preserves the utility for data analysis, and its accurate analysis result can be obliviously retrieved. Given any specified privacy leakage bound, the computation and communication overheads are minimized to effectively defend against different inference attacks. We empirically evaluate the performance of our outsourcing framework against two common inference attacks on two different real datasets: the check-in location dataset and network traffic dataset, respectively. The experimental results demonstrate that our proposed framework preserves both privacy (with bounded leakage and indistinguishability of data views) and utility (with 100% analysis accuracy).

5.
IEEE Intell Syst ; 37(4): 88-96, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36467258

RESUMEN

Intelligently responding to a pandemic like Covid-19 requires sophisticated models over accurate real-time data, which is typically lacking at the start, e.g., due to deficient population testing. In such times, crowdsensing of spatially tagged disease-related symptoms provides an alternative way of acquiring real-time insights about the pandemic. Existing crowdsensing systems aggregate and release data for pre-fixed regions, e.g., counties. However, the insights obtained from such aggregates do not provide useful information about smaller regions - e.g., neighborhoods where outbreaks typically occur - and the aggregate-and-release method is vulnerable to privacy attacks. Therefore, we propose a novel differentially private method to obtain accurate insights from crowdsensed data for any number of regions specified by the users (e.g., researchers and a policy makers) without compromising privacy of the data contributors. Our approach, which has been implemented and deployed, informs the development of the future privacy-preserving intelligent systems for longitudinal and spatial data analytics.

6.
J Biomed Inform ; 117: 103714, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33711538

RESUMEN

With cloud computing is being widely adopted in conducting genome-wide association studies (GWAS), how to verify the integrity of outsourced GWAS computation remains to be accomplished. Here, we propose two novel algorithms to generate synthetic SNPs that are indistinguishable from real SNPs. The first method creates synthetic SNPs based on the phenotype vector, while the second approach creates synthetic SNPs based on real SNPs that are most similar to the phenotype vector. The time complexity of the first approach and the second approach is Om and Omlogn2, respectively, where m is the number of subjects while n is the number of SNPs. Furthermore, through a game theoretic analysis, we demonstrate that it is possible to incentivize honest behavior by the server by coupling appropriate payoffs with randomized verification. We conduct extensive experiments of our proposed methods, and the results show that beyond a formal adversarial model, when only a few synthetic SNPs are generated and mixed into the real data they cannot be distinguished from the real SNPs even by a variety of predictive machine learning models. We demonstrate that the proposed approach can ensure that logistic regression for GWAS can be outsourced in an efficient and trustworthy way.


Asunto(s)
Nube Computacional , Estudio de Asociación del Genoma Completo , Algoritmos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple
7.
Comput Secur ; 972020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-33223585

RESUMEN

Secure computation of equivalence has fundamental application in many different areas, including health-care. We study this problem in the context of matching an individual's identity to link medical records across systems under the socialist millionaires' problem: Two millionaires wish to determine if their fortunes are equal without disclosing their net worth (Boudot, et al. 2001). In Theorem 2, we show that when a "greater than" algorithm is carried out on a totally ordered set it is easy to achieve secure matching without additional rounds of communication. We present this efficient solution to assess equivalence using a set intersection algorithm designed for "greater than" computation and demonstrate its effectiveness on equivalence of arbitrary data values, as well as demonstrate how it meets regulatory criteria for risk of disclosure.

8.
Artículo en Inglés | MEDLINE | ID: mdl-31885522

RESUMEN

In Attribute-Based Access Control (ABAC), access to resources is given based on the attributes of subjects, objects, and environment. There is an imminent need for the development of efficient algorithms that enable migration to ABAC. However, existing policy mining approaches do not consider possible adaptation to the policy of a similar organization. In this article, we address the problem of automatically determining an optimal assignment of attribute values to subjects for enabling the desired accesses to be granted while minimizing the number of ABAC rules used by each subject or other appropriate metrics. We show the problem to be NP-Complete and propose a heuristic solution.

9.
Comput Secur ; 86: 183-205, 2019 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-31662590

RESUMEN

Over the last few years, various types of access control models have been proposed for expressing the growing needs of organizations. Out of these, there is an increasing interest towards specification and enforcement of flexible and dynamic decision making security policies using Attribute Based Access Control (ABAC). However, it is not easy to migrate an existing security policy specified in a different model into ABAC. Furthermore, there exists no comprehensive approach that can specify, enforce and manage ABAC policies along with other policies potentially already existing in the organization as a unified security policy. In this article, we present a unique and flexible solution that enables concurrent specification and enforcement of such security policies through storing and querying data in a multi-dimensional and multi-granular data model. Specifically, we present a unified database schema, similar to that traditionally used in data warehouse design, that can represent different types of access control policies and store relevant policies as in-memory data, thereby significantly reducing the execution time of access request evaluation. We also present a novel approach for combining multiple access control policies through meta-policies. For ease of management, an administrative schema is presented that can specify different types of administrative policies. Extensive experiments on a wide range of data sets demonstrate the viability of the proposed approach.

10.
IEEE Internet Comput ; 22(2): 32-41, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29867290

RESUMEN

Patient health data are often found spread across various sources. However, precision medicine and personalized care requires access to the complete medical records. The first step towards this is to enable the linkage of health records spread across different sites. Existing record linkage solutions assume that data is centralized with no privacy/security concerns restricting sharing. However, that is often untrue. Therefore, we design and implement a portable method for privacy-preserving record linkage based on garbled circuits to accurately and securely match records. We also develop a novel approximate matching mechanism that significantly improves efficiency.

11.
IEEE Trans Priv ; 1: 3-18, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38979543

RESUMEN

Privacy Enhancing Technologies (PETs) have the potential to enable collaborative analytics without compromising privacy. This is extremely important for collaborative analytics can allow us to really extract value from the large amounts of data that are collected in domains such as healthcare, finance, and national security, among others. In order to foster innovation and move PETs from the research labs to actual deployment, the U.S. and U.K. governments partnered together in 2021 to propose the PETs prize challenge asking for privacy-enhancing solutions for two of the biggest problems facing us today: financial crime prevention and pandemic response. This article presents the Rutgers ScarletPets privacy-preserving federated learning approach to identify anomalous financial transactions in a payment network system (PNS). This approach utilizes a two-step anomaly detection methodology to solve the problem. In the first step, features are mined based on account-level data and labels, and then a privacy-preserving encoding scheme is used to augment these features to the data held by the PNS. In the second step, the PNS learns a highly accurate classifier from the augmented data. Our proposed approach has two major advantages: 1) there is no noteworthy drop in accuracy between the federated and the centralized setting, and 2) our approach is flexible since the PNS can keep improving its model and features to build a better classifier without imposing any additional computational or privacy burden on the banks. Notably, our solution won the first prize in the US for its privacy, utility, efficiency, and flexibility.

12.
IEEE Internet Things J ; 11(3): 3779-3791, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38283301

RESUMEN

Current Internet of Things (IoT) devices provide a diverse range of functionalities, ranging from measurement and dissemination of sensory data observation, to computation services for real-time data stream processing. In extreme situations such as emergencies, a significant benefit of IoT devices is that they can help gain a more complete situational understanding of the environment. However, this requires the ability to utilize IoT resources while taking into account location, battery life, and other constraints of the underlying edge and IoT devices. A dynamic approach is proposed for orchestration and management of distributed workflow applications using services available in cloud data centers, deployed on servers, or IoT devices at the network edge. Our proposed approach is specifically designed for knowledge-driven business process workflows that are adaptive, interactive, evolvable and emergent. A comprehensive empirical evaluation shows that the proposed approach is effective and resilient to situational changes.

13.
IEEE Trans Emerg Top Comput ; 11(1): 208-223, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37274839

RESUMEN

NoSQL databases are being increasingly used for efficient management of high volumes of unstructured data in applications like information retrieval, natural language processing, social computing, etc. However, unlike traditional databases, data protection measures such as access control for these databases are still in their infancy, which could lead to significant vulnerabilities and security/privacy issues as their adoption increases. Attribute-based Access Control (ABAC), which provides a flexible and dynamic solution to access control, can be effective for mediating accesses in typical usage scenarios for NoSQL databases. In this paper, we propose a novel methodology for enabling ABAC in NoSQL databases. Specifically we consider MongoDB, which is one of the most popular NoSQL databases in use today. We present an approach to both specify ABAC access control policies and to enforce them when an actual access request has been made. MongoDB Wire Protocol is used for extracting and processing appropriate information from the requests. We also present a method for supporting dynamic access decisions using environmental attributes and handling of ad-hoc access requests through digitally signed user attributes. Results from an extensive set of experiments on the Enron corpus as well as on synthetically generated data demonstrate the scalability of our approach. Finally, we provide details of our implementation on MongoDB and share a Github repository so that any organization can download and deploy the same for enabling ABAC in their own MongoDB installations.

14.
Artículo en Inglés | MEDLINE | ID: mdl-38562180

RESUMEN

Reproducibility, transparency, representation, and privacy underpin the trust on genomics research in general and genome-wide association studies (GWAS) in particular. Concerns about these issues can be mitigated by technologies that address privacy protection, quality control, and verifiability of GWAS. However, many of the existing technological solutions have been developed in isolation and may address one aspect of reproducibility, transparency, representation, and privacy of GWAS while unknowingly impacting other aspects. As a consequence, the current patchwork of technological tools only partially and in an overlapping manner address issues with GWAS, sometimes even creating more problems. This paper addresses the progress in a field that creates technological solutions that augment the acceptance and security of population genetic analyses. The text identifies areas that are falling behind in technical implementation or where there is insufficient research. We make the case that a full understanding of the different GWAS settings, technological tools and new research directions can holistically address the requirements for the acceptance of GWAS.

15.
IEEE Trans Serv Comput ; 16(1): 162-176, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36776787

RESUMEN

The emergence of cloud and edge computing has enabled rapid development and deployment of Internet-centric distributed applications. There are many platforms and tools that can facilitate users to develop distributed business process (BP) applications by composing relevant service components in a plug and play manner. However, there is no guarantee that a BP application developed in this way is fault-free. In this paper, we formalize the problem of collaborative BP fault resolution which aims to utilize information from existing fault-free BPs that use similar services to resolve faults in a user developed BP. We present an approach based on association analysis of pairwise transformations between a faulty BP and existing BPs to identify the smallest possible set of transformations to resolve the fault(s) in the user developed BP. An extensive experimental evaluation over both synthetically generated faulty BPs and real BPs developed by users shows the effectiveness of our approach.

16.
AMIA Jt Summits Transl Sci Proc ; 2023: 534-543, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37351796

RESUMEN

Kinship relationship estimation plays a significant role in today's genome studies. Since genetic data are mostly stored and protected in different silos, retrieving the desirable kinship relationships across federated data warehouses is a non-trivial problem. The ability to identify and connect related individuals is important for both research and clinical applications. In this work, we propose a new privacy-preserving kinship relationship estimation framework: Incremental Update Kinship Identification (INK). The proposed framework includes three key components that allow us to control the balance between privacy and accuracy (of kinship estimation): an incremental process coupled with the use of auxiliary information and informative scores. Our empirical evaluation shows that INK can achieve higher kinship identification correctness while exposing fewer genetic markers.

17.
Artículo en Inglés | MEDLINE | ID: mdl-38094985

RESUMEN

Outlier detection is a fundamental data analytics technique often used for many security applications. Numerous outlier detection techniques exist, and in most cases are used to directly identify outliers without any interaction. Typically the underlying data used is often high dimensional and complex. Even though outliers may be identified, since humans can easily grasp low dimensional spaces, it is difficult for a security expert to understand/visualize why a particular event or record has been identified as an outlier. In this paper we study the extent to which outlier detection techniques work in smaller dimensions and how well dimensional reduction techniques still enable accurate detection of outliers. This can help us to understand the extent to which data can be visualized while still retaining the intrinsic outlyingness of the outliers.

18.
Proc ACM Workshop Priv Electron Soc ; 2022: 109-113, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36507926

RESUMEN

Symptoms-tracking applications allow crowdsensing of health and location related data from individuals to track the spread and outbreaks of infectious diseases. During the COVID-19 pandemic, for the first time in history, these apps were widely adopted across the world to combat the pandemic. However, due to the sensitive nature of the data collected by these apps, serious privacy concerns were raised and apps were critiqued for their insufficient privacy safeguards. The Covid Nearby project was launched to develop a privacy-focused symptoms-tracking app and to understand the privacy preferences of users in health emergencies. In this work, we draw on the insights from the Covid Nearby users' data, and present an analysis of the significantly varying trends in users' privacy preferences with respect to demographics, attitude towards information sharing, and health concerns, e.g. after being possibly exposed to COVID-19. These results and insights can inform health informatics researchers and policy designers in developing more socially acceptable health apps in the future.

19.
Artículo en Inglés | MEDLINE | ID: mdl-36655144

RESUMEN

Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. By performing low-cost joins on multiple datasets with shared attributes, malicious users of open data portals might get access to information that violates individuals' privacy. However, open data sets are primarily published using a release-and-forget model, whereby data owners and custodians have little to no cognizance of these privacy risks. We address this critical gap by developing a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. The solution is derived through a design study with data privacy researchers, where we initially play the role of a red team and engage in an ethical data hacking exercise based on privacy attack scenarios. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism and realize them in PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for data defenders. PRIVEE uses a combination of risk scores and associated interactive visualizations to let data defenders explore vulnerable joins and interpret risks at multiple levels of data granularity. We demonstrate how PRIVEE can help emulate the attack strategies and diagnose disclosure risks through two case studies with data privacy experts.

20.
IFIP Adv Inf Commun Technol ; 648: 360-376, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-36544863

RESUMEN

Hyperledger Fabric (HLF) is an open-source platform for deploying enterprise-level permissioned blockchains where users from multiple organizations can participate. Preventing unauthorized access to resources in such blockchains is of critical importance. Towards addressing this requirement, HLF supports different access control models. However, support for Attribute-Based Access Control (ABAC) in the current version of HLF is not comprehensive enough to address various requirements that arise when multiple organizations interact in an enterprise setting. To address those shortcomings, in this paper, we develop and present methods for providing full ABAC functionality in Hyperledger Fabric. Performance evaluation under different network configurations using the Hyperledger Caliper benchmarking tool shows that the proposed approach is quite efficient in practice.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA