Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Gigascience ; 10(10)2021 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-34605868

RESUMO

BACKGROUND: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets. FINDINGS: For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets. CONCLUSION: With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing.


Assuntos
Anonimização de Dados , Privacidade , Algoritmos , Humanos , Disseminação de Informação , Software
2.
Stud Health Technol Inform ; 270: 68-72, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570348

RESUMO

Modern biomedical research is increasingly data-driven. To create the required big datasets, health data needs to be shared or reused, which often leads to privacy challenges. Data anonymization is an important protection method where data is transformed such that privacy guarantees can be provided according to formal models. For applications in practice, anonymization methods need to be integrated into scalable and reliable tools. In this work, we tackle the problem of achieving reliability. Privacy models often involve mathematical definitions using real numbers which are typically approximated using floating-point numbers when implemented as software. We study the effect on the privacy guarantees provided and present a reliable computing framework based on fractional and interval arithmetic for improving the reliability of implementations. Extensive evaluations demonstrate that reliable data anonymization is practical and that it can be achieved with minor impacts on executions times and data utility.


Assuntos
Pesquisa Biomédica , Anonimização de Dados , Confidencialidade , Privacidade , Reprodutibilidade dos Testes , Software
3.
Stud Health Technol Inform ; 270: 193-197, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570373

RESUMO

Biomedical research has become data-driven. To create the required big datasets, health data needs to be shared or reused out of the context of its initial purpose. This leads to significant privacy challenges. Data anonymization is an important protection method where data is transformed such that privacy guarantees can be provided according to formal models. For applications in practice, anonymization methods need to be integrated into scalable and robust tools. In this work, we focus on the problem of scalability. Protecting biomedical data from inference attacks is challenging, in particular for numeric data. An important privacy model in this context is t-closeness, which has also been defined for attribute values which are totally ordered. However, directly implementing a scalable algorithmic representation of the mathematical definition of the model proves difficult. In this paper we therefore present a series of optimizations that can be used to achieve efficiency in production use. An experimental evaluation shows that our approach reduces execution times of anonymization processes involving t-closeness by up to a factor of two.


Assuntos
Revelação , Pesquisa Biomédica , Anonimização de Dados , Privacidade
4.
BMC Med Inform Decis Mak ; 20(1): 103, 2020 06 05.
Artigo em Inglês | MEDLINE | ID: mdl-32503529

RESUMO

BACKGROUND: The aim of the German Medical Informatics Initiative is to establish a national infrastructure for integrating and sharing health data. To this, Data Integration Centers are set up at university medical centers, which address data harmonization, information security and data protection. To capture patient consent, a common informed consent template has been developed. It consists of different modules addressing permissions for using data and biosamples. On the technical level, a common digital representation of information from signed consent templates is needed. As the partners in the initiative are free to adopt different solutions for managing consent information (e.g. IHE BPPC or HL7 FHIR Consent Resources), we had to develop an interoperability layer. METHODS: First, we compiled an overview of data items required to reflect the information from the MII consent template as well as patient preferences and derived permissions. Next, we created entity-relationship diagrams to formally describe the conceptual data model underlying relevant items. We then compared this data model to conceptual models describing representations of consent information using different interoperability standards. We used the result of this comparison to derive an interoperable representation that can be mapped to common standards. RESULTS: The digital representation needs to capture the following information: (1) version of the consent, (2) consent status for each module, and (3) period of validity of the status. We found that there is no generally accepted solution to represent status information in a manner interoperable with all relevant standards. Hence, we developed a pragmatic solution, comprising codes which describe combinations of modules with a basic set of status labels. We propose to maintain these codes in a public registry called ART-DECOR. We present concrete technical implementations of our approach using HL7 FHIR and IHE BPPC which are also compatible with the open-source consent management software gICS. CONCLUSIONS: The proposed digital representation is (1) generic enough to capture relevant information from a wide range of consent documents and data use regulations and (2) interoperable with common technical standards. We plan to extend our model to include more fine-grained status codes and rules for automated access control.


Assuntos
Segurança Computacional , Consentimento Livre e Esclarecido , Informática Médica , Alemanha , Humanos , Software
5.
BMC Med Inform Decis Mak ; 20(1): 29, 2020 02 11.
Artigo em Inglês | MEDLINE | ID: mdl-32046701

RESUMO

BACKGROUND: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap. RESULTS: We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques. CONCLUSIONS: With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software.


Assuntos
Confidencialidade , Anonimização de Dados , Sistemas de Apoio a Decisões Clínicas , Modelos Estatísticos , Software , Pesquisa Biomédica , Humanos , Aprendizado de Máquina , Curva ROC , Reprodutibilidade dos Testes
6.
Int J Med Inform ; 126: 72-81, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31029266

RESUMO

BACKGROUND: Modern data-driven approaches to medical research require patient-level information at comprehensive depth and breadth. To create the required big datasets, information from disparate sources can be integrated into clinical and translational warehouses. This is typically implemented with Extract, Transform, Load (ETL) processes, which access, harmonize and upload data into the analytics platform. OBJECTIVE: Privacy-protection needs careful consideration when data is pooled or re-used for secondary purposes, and data anonymization is an important protection mechanism. However, common ETL environments do not support anonymization, and common anonymization tools cannot easily be integrated into ETL workflows. The objective of the work described in this article was to bridge this gap. METHODS: Our main design goals were (1) to base the anonymization process on expert-level risk assessment methodologies, (2) to use transformation methods which preserve both the truthfulness of data and its schematic properties (e.g. data types), (3) to implement a method which is easy to understand and intuitive to configure, and (4) to provide high scalability. RESULTS: We designed a novel and efficient anonymization process and implemented a plugin for the Pentaho Data Integration (PDI) platform, which enables integrating data anonymization and re-identification risk analyses directly into ETL workflows. By combining different instances into a single ETL process, data can be protected from multiple threats. The plugin supports very large datasets by leveraging the streaming-based processing model of the underlying platform. We present results of an extensive experimental evaluation and discuss successful applications. CONCLUSIONS: Our work shows that expert-level anonymization methodologies can be integrated into ETL workflows. Our implementation is available under a non-restrictive open source license and it overcomes several limitations of other data anonymization tools.


Assuntos
Pesquisa Biomédica , Privacidade , Algoritmos , Conjuntos de Dados como Assunto , Humanos
7.
BMC Med Inform Decis Mak ; 17(1): 30, 2017 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-28330491

RESUMO

BACKGROUND: Translational researchers need robust IT solutions to access a range of data types, varying from public data sets to pseudonymised patient information with restricted access, provided on a case by case basis. The reason for this complication is that managing access policies to sensitive human data must consider issues of data confidentiality, identifiability, extent of consent, and data usage agreements. All these ethical, social and legal aspects must be incorporated into a differential management of restricted access to sensitive data. METHODS: In this paper we present a pilot system that uses several common open source software components in a novel combination to coordinate access to heterogeneous biomedical data repositories containing open data (open access) as well as sensitive data (restricted access) in the domain of biobanking and biosample research. Our approach is based on a digital identity federation and software to manage resource access entitlements. RESULTS: Open source software components were assembled and configured in such a way that they allow for different ways of restricted access according to the protection needs of the data. We have tested the resulting pilot infrastructure and assessed its performance, feasibility and reproducibility. CONCLUSIONS: Common open source software components are sufficient to allow for the creation of a secure system for differential access to sensitive data. The implementation of this system is exemplary for researchers facing similar requirements for restricted access data. Here we report experience and lessons learnt of our pilot implementation, which may be useful for similar use cases. Furthermore, we discuss possible extensions for more complex scenarios.


Assuntos
Bancos de Espécimes Biológicos/normas , Pesquisa Biomédica/normas , Segurança Computacional/normas , Conjuntos de Dados como Assunto , Pesquisa Translacional Biomédica/normas , Humanos , Projetos Piloto
8.
Stud Health Technol Inform ; 228: 312-6, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27577394

RESUMO

Data sharing plays an important role in modern biomedical research. Due to the inherent sensitivity of health data, patient privacy must be protected. De-identification means to transform a dataset in such a way that it becomes extremely difficult for an attacker to link its records to identified individuals. This can be achieved with different types of data transformations. As transformation impacts the information content of a dataset, it is important to balance an increase in privacy with a decrease in data quality. To this end, models for measuring both aspects are needed. Non-Uniform Entropy is a model for data quality which is frequently recommended for de-identifying health data. In this work we show that it cannot be used in a meaningful way for measuring the quality of data which has been transformed with several important types of data transformation. We introduce a generic variant, which overcomes this limitation. We performed experiments with real-world datasets, which show that our method provides a unified framework in which the quality of differently transformed data can be compared to find a good or even optimal solution to a given data de-identification problem. We have implemented our method into ARX, an open source anonymization tool for biomedical data.


Assuntos
Confidencialidade , Disseminação de Informação , Armazenamento e Recuperação da Informação/métodos , Armazenamento e Recuperação da Informação/normas , Controle de Qualidade , Pesquisa Biomédica
9.
Biopreserv Biobank ; 14(4): 298-306, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26977825

RESUMO

Biobanks are the biological back end of data-driven medicine, but lack standards and generic solutions for interoperability and information harmonization. The move toward a global information infrastructure for biobanking demands semantic interoperability through harmonized services and common ontologies. To tackle this issue, the Minimum Information About BIobank data Sharing (MIABIS) was developed in 2012 by the Biobanking and BioMolecular Resources Research Infrastructure of Sweden (BBMRI.se). The wide acceptance of the first version of MIABIS encouraged evolving it to a more structured and descriptive standard. In 2013 a working group was formed under the largest infrastructure for health in Europe, Biobanking and BioMolecular Resources Research Infrastructure (BBMRI-ERIC), with the remit to continue the development of MIABIS (version 2.0) through a multicountry governance process. MIABIS 2.0 Core has been developed with 22 attributes describing Biobanks, Sample Collections, and Studies according to a modular structure that makes it easier to adhere to and to extend the standard. This integration standard will make a great contribution to the discovery and exploitation of biobank resources and lead to a wider and more efficient use of valuable bioresources, thereby speeding up the research on human diseases. Many within the European Union have accepted MIABIS 2.0 Core as the "de facto" biobank information standard.


Assuntos
Bancos de Espécimes Biológicos/organização & administração , Manejo de Espécimes/normas , Bancos de Espécimes Biológicos/normas , Bases de Dados Factuais , União Europeia , Humanos , Disseminação de Informação , Software
10.
Artigo em Alemão | MEDLINE | ID: mdl-26809823

RESUMO

BACKGROUND: In addition to the Biobanking and BioMolecular resources Research Initiative (BBMRI), which is establishing a European research infrastructure for biobanks, a network for large European prospective cohorts (LPC) is being built to facilitate transnational research into important groups of diseases and health care. One instrument for this is the database "LPC Catalogue," which supports access to the biomaterials of the participating cohorts. OBJECTIVES: To present the LPC Catalogue as a relevant tool for connecting European biobanks. In addition, the LPC Catalogue has been extended to establish compatibility with existing Minimum Information About Biobank data Sharing (MIABIS) and to allow for more detailed search requests. This article describes the LPC Catalogue, its organizational and technical structure, and the aforementioned extensions. MATERIALS AND METHODS: The LPC Catalogue provides a structured overview of the participating LPCs. It offers various retrieval possibilities and a search function. To support more detailed search requests, a new module has been developed, called a "data cube". The provision of data by the cohorts is being supported by a "connector" component. RESULTS: The LPC Catalogue contains data on 22 cohorts and more than 3.8 million biosamples. At present, data on the biosamples of three cohorts have been acquired for the "cube," which is continuously being expanded. In the BBMRI-LPC, tendering for scientific projects using the data and samples of the participating cohorts is currently being carried out. In this context, several proposals have already been approved. CONCLUSIONS: The LPC Catalogue is supporting transnational access to biosamples. A comparison with existing solutions illustrates the relevance of its functionality.


Assuntos
Bancos de Espécimes Biológicos/organização & administração , Pesquisa Biomédica/organização & administração , Catálogos como Assunto , Sistemas de Gerenciamento de Base de Dados/organização & administração , Bases de Dados Factuais , Relações Interinstitucionais , Estudos de Coortes , Europa (Continente) , Previsões , Disseminação de Informação/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Organizacionais , Sistema de Registros , Manejo de Espécimes/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA