Pesquisa | Portal Regional da BVS

Mostrar: 20 | 50 | 100

Resultados 1 - 10 de 10

Filtrar

The Costs of Anonymization: Case Study Using Clinical Data.

Pilgram, Lisa; Meurers, Thierry; Malin, Bradley; Schaeffner, Elke; Eckardt, Kai-Uwe; Prasser, Fabian.

J Med Internet Res ; 26: e49445, 2024 04 24.

Artigo em Inglês | MEDLINE | ID: mdl-38657232

RESUMO

BACKGROUND: Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set's statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice. OBJECTIVE: The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study. METHODS: The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case-specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results. RESULTS: Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy. CONCLUSIONS: Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case-specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data. TRIAL REGISTRATION: German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1093/ndt/gfr456.

Assuntos

Anonimização de Dados , Humanos , Insuficiência Renal Crônica/terapia , Disseminação de Informação/métodos , Algoritmos , Alemanha , Confidencialidade , Privacidade

Data Provenance in Biomedical Research: Scoping Review.

Johns, Marco; Meurers, Thierry; Wirth, Felix N; Haber, Anna C; Müller, Armin; Halilovic, Mehmed; Balzer, Felix; Prasser, Fabian.

J Med Internet Res ; 25: e42289, 2023 03 27.

Artigo em Inglês | MEDLINE | ID: mdl-36972116

RESUMO

BACKGROUND: Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research. OBJECTIVE: The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption. METHODS: Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures. RESULTS: We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV. CONCLUSIONS: The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.

Assuntos

Pesquisa Biomédica , Humanos , Metadados , PubMed , Reprodutibilidade dos Testes , Software

Generating evidence on privacy outcomes to inform privacy risk management: A way forward?

Strech, Daniel; Haven, Tamarinde; Madai, Vince I; Meurers, Thierry; Prasser, Fabian.

J Biomed Inform ; 137: 104257, 2023 01.

Artigo em Inglês | MEDLINE | ID: mdl-36462598

RESUMO

Effective and efficient privacy risk management (PRM) is a necessary condition to support digitalization in health care and secondary use of patient data in research. To reduce privacy risks, current PRM frameworks are rooted in an approach trying to reduce undesired technical/organizational outcomes such as broken encryption or unintentional data disclosure. Comparing this with risk management in preventive or therapeutic medicine, a key difference becomes apparent: in health-related risk management, medicine focuses on person-specific health outcomes, whereas PRM mostly targets more indirect, technical/organizational outcomes. In this paper, we illustrate and discuss how a PRM approach based on evidence of person-specific privacy outcomes might look using three consecutive steps: i) a specification of undesired person-specific privacy outcomes, ii) empirical assessments of their frequency and severity, and iii) empirical studies on how effectively the available PRM interventions reduce their frequency or severity. After an introduction of these three steps, we cover their status quo and outline open questions and PRM-specific challenges in need of further conceptual clarification and feasibility studies. Specific challenges of an outcome-oriented approach to PRM include the potential delays between concrete threats manifesting and the resulting person/group-specific privacy outcomes. Moreover, new ways of exploiting privacy-sensitive information to harm individuals could be developed in the future. The challenges described are of technical, legal, ethical, financial and resource-oriented nature. In health research, however, there is explicit discussion about how to overcome such challenges to make important outcome-based assessments as feasible as possible. This paper concludes that it might be the time to have this discussion in the PRM field as well.

Assuntos

Confidencialidade , Privacidade , Humanos

Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients.

Koll, Carolin E M; Hopff, Sina M; Meurers, Thierry; Lee, Chin Huang; Kohls, Mirjam; Stellbrink, Christoph; Thibeault, Charlotte; Reinke, Lennart; Steinbrecher, Sarah; Schreiber, Stefan; Mitrov, Lazar; Frank, Sandra; Miljukov, Olga; Erber, Johanna; Hellmuth, Johannes C; Reese, Jens-Peter; Steinbeis, Fridolin; Bahmer, Thomas; Hagen, Marina; Meybohm, Patrick; Hansch, Stefan; Vadász, István; Krist, Lilian; Jiru-Hillmann, Steffi; Prasser, Fabian; Vehreschild, Jörg Janne.

Sci Data ; 9(1): 776, 2022 12 21.

Artigo em Inglês | MEDLINE | ID: mdl-36543828

RESUMO

Anonymization has the potential to foster the sharing of medical data. State-of-the-art methods use mathematical models to modify data to reduce privacy risks. However, the degree of protection must be balanced against the impact on statistical properties. We studied an extreme case of this trade-off: the statistical validity of an open medical dataset based on the German National Pandemic Cohort Network (NAPKON), which was prepared for publication using a strong anonymization procedure. Descriptive statistics and results of regression analyses were compared before and after anonymization of multiple variants of the original dataset. Despite significant differences in value distributions, the statistical bias was found to be small in all cases. In the regression analyses, the median absolute deviations of the estimated adjusted odds ratios for different sample sizes ranged from 0.01 [minimum = 0, maximum = 0.58] to 0.52 [minimum = 0.25, maximum = 0.91]. Disproportionate impact on the statistical properties of data is a common argument against the use of anonymization. Our analysis demonstrates that anonymization can actually preserve validity of statistical results in relatively low-dimensional data.

Assuntos

COVID-19 , Humanos , Viés , Anonimização de Dados , Modelos Teóricos , Privacidade , Interpretação Estatística de Dados , Conjuntos de Dados como Assunto

Generation and Evaluation of Synthetic Data in a University Hospital Setting.

Kaabachi, Bayrem; Despraz, Jérémie; Meurers, Thierry; Prasser, Fabian; Raisaro, Jean Louis.

Stud Health Technol Inform ; 294: 141-142, 2022 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-35612040

RESUMO

In this study, we propose a unified evaluation framework for systematically assessing the utility-privacy trade-off of synthetic data generation (SDG) models. These SDG models are adapted to deal with longitudinal or tabular data stemming from electronic health records (EHR) containing both discrete and numeric features. Our evaluation framework considers different data sharing scenarios and attacker models.

Assuntos

Registros Eletrônicos de Saúde , Privacidade , Hospitais Universitários , Humanos

A scalable software solution for anonymizing high-dimensional biomedical data.

Meurers, Thierry; Bild, Raffael; Do, Kieu-Mi; Prasser, Fabian.

Gigascience ; 10(10)2021 10 04.

Artigo em Inglês | MEDLINE | ID: mdl-34605868

RESUMO

BACKGROUND: Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets. FINDINGS: For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets. CONCLUSION: With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing.

Assuntos

Anonimização de Dados , Privacidade , Algoritmos , Humanos , Disseminação de Informação , Software

A Flying Platform to Investigate Neuronal Correlates of Navigation in the Honey Bee (Apis mellifera).

Paffhausen, Benjamin H; Petrasch, Julian; Wild, Benjamin; Meurers, Thierry; Schülke, Tobias; Polster, Johannes; Fuchs, Inga; Drexler, Helmut; Kuriatnyk, Oleksandra; Menzel, Randolf; Landgraf, Tim.

Front Behav Neurosci ; 15: 690571, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34354573

RESUMO

Navigating animals combine multiple perceptual faculties, learn during exploration, retrieve multi-facetted memory contents, and exhibit goal-directedness as an expression of their current needs and motivations. Navigation in insects has been linked to a variety of underlying strategies such as path integration, view familiarity, visual beaconing, and goal-directed orientation with respect to previously learned ground structures. Most works, however, study navigation either from a field perspective, analyzing purely behavioral observations, or combine computational models with neurophysiological evidence obtained from lab experiments. The honey bee (Apis mellifera) has long been a popular model in the search for neural correlates of complex behaviors and exhibits extraordinary navigational capabilities. However, the neural basis for bee navigation has not yet been explored under natural conditions. Here, we propose a novel methodology to record from the brain of a copter-mounted honey bee. This way, the animal experiences natural multimodal sensory inputs in a natural environment that is familiar to her. We have developed a miniaturized electrophysiology recording system which is able to record spikes in the presence of time-varying electric noise from the copter's motors and rotors, and devised an experimental procedure to record from mushroom body extrinsic neurons (MBENs). We analyze the resulting electrophysiological data combined with a reconstruction of the animal's visual perception and find that the neural activity of MBENs is linked to sharp turns, possibly related to the relative motion of visual features. This method is a significant technological step toward recording brain activity of navigating honey bees under natural conditions. By providing all system specifications in an online repository, we hope to close a methodological gap and stimulate further research informing future computational models of insect navigation.

Privacy-preserving data sharing infrastructures for medical research: systematization and comparison.

Wirth, Felix Nikolaus; Meurers, Thierry; Johns, Marco; Prasser, Fabian.

BMC Med Inform Decis Mak ; 21(1): 242, 2021 08 12.

Artigo em Inglês | MEDLINE | ID: mdl-34384406

RESUMO

BACKGROUND: Data sharing is considered a crucial part of modern medical research. Unfortunately, despite its advantages, it often faces obstacles, especially data privacy challenges. As a result, various approaches and infrastructures have been developed that aim to ensure that patients and research participants remain anonymous when data is shared. However, privacy protection typically comes at a cost, e.g. restrictions regarding the types of analyses that can be performed on shared data. What is lacking is a systematization making the trade-offs taken by different approaches transparent. The aim of the work described in this paper was to develop a systematization for the degree of privacy protection provided and the trade-offs taken by different data sharing methods. Based on this contribution, we categorized popular data sharing approaches and identified research gaps by analyzing combinations of promising properties and features that are not yet supported by existing approaches. METHODS: The systematization consists of different axes. Three axes relate to privacy protection aspects and were adopted from the popular Five Safes Framework: (1) safe data, addressing privacy at the input level, (2) safe settings, addressing privacy during shared processing, and (3) safe outputs, addressing privacy protection of analysis results. Three additional axes address the usefulness of approaches: (4) support for de-duplication, to enable the reconciliation of data belonging to the same individuals, (5) flexibility, to be able to adapt to different data analysis requirements, and (6) scalability, to maintain performance with increasing complexity of shared data or common analysis processes. RESULTS: Using the systematization, we identified three different categories of approaches: distributed data analyses, which exchange anonymous aggregated data, secure multi-party computation protocols, which exchange encrypted data, and data enclaves, which store pooled individual-level data in secure environments for access for analysis purposes. We identified important research gaps, including a lack of approaches enabling the de-duplication of horizontally distributed data or providing a high degree of flexibility. CONCLUSIONS: There are fundamental differences between different data sharing approaches and several gaps in their functionality that may be interesting to investigate in future work. Our systematization can make the properties of privacy-preserving data sharing infrastructures more transparent and support decision makers and regulatory authorities with a better understanding of the trade-offs taken.

Assuntos

Pesquisa Biomédica , Privacidade , Segurança Computacional , Humanos , Disseminação de Informação

Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19.

Jakob, Carolin E M; Kohlmayer, Florian; Meurers, Thierry; Vehreschild, Jörg Janne; Prasser, Fabian.

Sci Data ; 7(1): 435, 2020 12 10.

Artigo em Inglês | MEDLINE | ID: mdl-33303746

RESUMO

The Lean European Open Survey on SARS-CoV-2 Infected Patients (LEOSS) is a European registry for studying the epidemiology and clinical course of COVID-19. To support evidence-generation at the rapid pace required in a pandemic, LEOSS follows an Open Science approach, making data available to the public in real-time. To protect patient privacy, quantitative anonymization procedures are used to protect the continuously published data stream consisting of 16 variables on the course and therapy of COVID-19 from singling out, inference and linkage attacks. We investigated the bias introduced by this process and found that it has very little impact on the quality of output data. Current laws do not specify requirements for the application of formal anonymization methods, there is a lack of guidelines with clear recommendations and few real-world applications of quantitative anonymization procedures have been described in the literature. We therefore believe that our work can help others with developing urgently needed anonymization pipelines for their projects.

Assuntos

COVID-19/epidemiologia , Anonimização de Dados , Pandemias , Sistema de Registros , Adulto , Idoso , Idoso de 80 Anos ou mais , Pesquisa Biomédica , Confidencialidade , Conjuntos de Dados como Assunto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade

10.

Citizen-Centered Mobile Health Apps Collecting Individual-Level Spatial Data for Infectious Disease Management: Scoping Review.

Wirth, Felix Nikolaus; Johns, Marco; Meurers, Thierry; Prasser, Fabian.

JMIR Mhealth Uhealth ; 8(11): e22594, 2020 11 10.

Artigo em Inglês | MEDLINE | ID: mdl-33074833

RESUMO

BACKGROUND: The novel coronavirus SARS-CoV-2 rapidly spread around the world, causing the disease COVID-19. To contain the virus, much hope is placed on participatory surveillance using mobile apps, such as automated digital contact tracing, but broad adoption is an important prerequisite for associated interventions to be effective. Data protection aspects are a critical factor for adoption, and privacy risks of solutions developed often need to be balanced against their functionalities. This is reflected by an intensive discussion in the public and the scientific community about privacy-preserving approaches. OBJECTIVE: Our aim is to inform the current discussions and to support the development of solutions providing an optimal balance between privacy protection and pandemic control. To this end, we present a systematic analysis of existing literature on citizen-centered surveillance solutions collecting individual-level spatial data. Our main hypothesis is that there are dependencies between the following dimensions: the use cases supported, the technology used to collect spatial data, the specific diseases focused on, and data protection measures implemented. METHODS: We searched PubMed and IEEE Xplore with a search string combining terms from the area of infectious disease management with terms describing spatial surveillance technologies to identify studies published between 2010 and 2020. After a two-step eligibility assessment process, 27 articles were selected for the final analysis. We collected data on the four dimensions described as well as metadata, which we then analyzed by calculating univariate and bivariate frequency distributions. RESULTS: We identified four different use cases, which focused on individual surveillance and public health (most common: digital contact tracing). We found that the solutions described were highly specialized, with 89% (24/27) of the articles covering one use case only. Moreover, we identified eight different technologies used for collecting spatial data (most common: GPS receivers) and five different diseases covered (most common: COVID-19). Finally, we also identified six different data protection measures (most common: pseudonymization). As hypothesized, we identified relationships between the dimensions. We found that for highly infectious diseases such as COVID-19 the most common use case was contact tracing, typically based on Bluetooth technology. For managing vector-borne diseases, use cases require absolute positions, which are typically measured using GPS. Absolute spatial locations are also important for further use cases relevant to the management of other infectious diseases. CONCLUSIONS: We see a large potential for future solutions supporting multiple use cases by combining different technologies (eg, Bluetooth and GPS). For this to be successful, however, adequate privacy-protection measures must be implemented. Technologies currently used in this context can probably not offer enough protection. We, therefore, recommend that future solutions should consider the use of modern privacy-enhancing techniques (eg, from the area of secure multiparty computing and differential privacy).

Assuntos

COVID-19/prevenção & controle , COVID-19/transmissão , Busca de Comunicante/métodos , Aplicativos Móveis , Vigilância em Saúde Pública/métodos , Análise Espaço-Temporal , Segurança Computacional , Humanos , Pandemias , Privacidade

Ver mais detalhes

ENVIAR RESULTADO:

Exportar

Imprimir

RSS

XML

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA