Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 65
Filtrar
1.
Digit Health ; 10: 20552076241265219, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39130526

RESUMO

Objective: Unlocking the potential of routine medical data for clinical research requires the analysis of data from multiple healthcare institutions. However, according to German data protection regulations, data can often not leave the individual institutions and decentralized approaches are needed. Decentralized studies face challenges regarding coordination, technical infrastructure, interoperability and regulatory compliance. Rare diseases are an important prototype research focus for decentralized data analyses, as patients are rare by definition and adequate cohort sizes can only be reached if data from multiple sites is combined. Methods: Within the project "Collaboration on Rare Diseases", decentralized studies focusing on four rare diseases (cystic fibrosis, phenylketonuria, Kawasaki disease, multisystem inflammatory syndrome in children) were conducted at 17 German university hospitals. Therefore, a data management process for decentralized studies was developed by an interdisciplinary team of experts from medicine, public health and data science. Along the process, lessons learned were formulated and discussed. Results: The process consists of eight steps and includes sub-processes for the definition of medical use cases, script development and data management. The lessons learned include on the one hand the organization and administration of the studies (collaboration of experts, use of standardized forms and publication of project information), and on the other hand the development of scripts and analysis (dependency on the database, use of standards and open source tools, feedback loops, anonymization). Conclusions: This work captures central challenges and describes possible solutions and can hence serve as a solid basis for the implementation and conduction of similar decentralized studies.

2.
Stud Health Technol Inform ; 316: 1199-1203, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176596

RESUMO

Sharing biomedical data for research can help to improve disease understanding and support the development of preventive, diagnostic, and therapeutic methods. However, it is vital to balance the amount of data shared and the sharing mechanism chosen with the privacy protection provided. This requires a detailed understanding of potential adversaries who might attempt to re-identify data and the consequences of their actions. The aim of this paper is to present a comprehensive list of potential types of adversaries, motivations, and harms to targeted individuals. A group of 13 researchers performed a three-step process in a one-day workshop, involving the identification of adversaries, the categorization by motivation, and the deduction of potential harms. The group collected 28 suggestions and categorized them into six types, each associated with several of six distinct harms. The findings align with previous efforts in structuring threat actors and outcomes and we believe that they provide a robust foundation for evaluating re-identification risks and developing protection measures in health data sharing scenarios.


Assuntos
Segurança Computacional , Confidencialidade , Disseminação de Informação , Humanos
3.
Stud Health Technol Inform ; 316: 1224-1225, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176601

RESUMO

The identification of vulnerable records (targets) is an important step for many privacy attacks on protected health data. We implemented and evaluated three outlier metrics for detecting potential targets. Next, we assessed differences and similarities between the top-k targets suggested by the different methods and studied how susceptible those targets are to membership inference attacks on synthetic data. Our results suggest that there is no one-size-fits-all approach and that target selection methods should be chosen based on the type of attack that is to be performed.


Assuntos
Segurança Computacional , Confidencialidade , Registros Eletrônicos de Saúde , Humanos
4.
Stud Health Technol Inform ; 316: 1248-1249, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176607

RESUMO

The SARS-CoV-2 pandemic highlighted the importance of fast, collaborative research in biomedicine. Within the ORCHESTRA consortium, we rapidly deployed a pseudonymization service with minimal training and maintenance efforts under time-critical conditions to support a complex, multi-site research project. Over two years, the service was deployed in 13 sites across 11 countries to register more than 10,000 study participants and 15,000 biosamples. In this work, we present lessons learned as part of this process. Most importantly, we learned that common challenges can be overcome by creatively utilizing widely available tools and that having a dedicated partner to manage software rollout and pre-configure software packages for each site fosters the effective implementation.


Assuntos
COVID-19 , Humanos , SARS-CoV-2 , Software , Pesquisa Biomédica , Pandemias
5.
Orphanet J Rare Dis ; 19(1): 265, 2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-39010138

RESUMO

BACKGROUND: Globally, researchers are working on projects aiming to enhance the availability of data for rare disease research. While data sharing remains critical, developing suitable methods is challenging due to the specific sensitivity and uniqueness of rare disease data. This creates a dilemma, as there is a lack of both methods and necessary data to create appropriate approaches initially. This work contributes to bridging this gap by providing synthetic datasets that can form the foundation for such developments. METHODS: Using a hierarchical data generation approach parameterised with publicly available statistics, we generated datasets reflecting a random sample of rare disease patients from the United States (US) population. General demographics were obtained from the US Census Bureau, while information on disease prevalence, initial diagnosis, survival rates as well as race and sex ratios were obtained from the information provided by the US Centers for Disease Control and Prevention as well as the scientific literature. The software, which we have named SynthMD, was implemented in Python as open source using libraries such as Faker for generating individual data points. RESULTS: We generated three datasets focusing on three specific rare diseases with broad impact on US citizens, as well as differences in affected genders and racial groups: Sickle Cell Disease, Cystic Fibrosis, and Duchenne Muscular Dystrophy. We present the statistics used to generate the datasets and study the statistical properties of output data. The datasets, as well as the code used to generate them, are available as Open Data and Open Source Software. CONCLUSION: The results of our work can serve as a starting point for researchers and developers working on methods and platforms that aim to improve the availability of rare disease data. Potential applications include using the datasets for testing purposes during the implementation of information systems or tailored privacy-enhancing technologies.


Assuntos
Doenças Raras , Software , Humanos , Estados Unidos , Masculino , Feminino
6.
Sci Rep ; 14(1): 14412, 2024 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-38909025

RESUMO

Access to individual-level health data is essential for gaining new insights and advancing science. In particular, modern methods based on artificial intelligence rely on the availability of and access to large datasets. In the health sector, access to individual-level data is often challenging due to privacy concerns. A promising alternative is the generation of fully synthetic data, i.e., data generated through a randomised process that have similar statistical properties as the original data, but do not have a one-to-one correspondence with the original individual-level records. In this study, we use a state-of-the-art synthetic data generation method and perform in-depth quality analyses of the generated data for a specific use case in the field of nutrition. We demonstrate the need for careful analyses of synthetic data that go beyond descriptive statistics and provide valuable insights into how to realise the full potential of synthetic datasets. By extending the methods, but also by thoroughly analysing the effects of sampling from a trained model, we are able to largely reproduce significant real-world analysis results in the chosen use case.


Assuntos
Análise de Dados , Humanos , Estudos Longitudinais , Inteligência Artificial
7.
Digit Health ; 10: 20552076241248922, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38766364

RESUMO

Background: The ORCHESTRA project, funded by the European Commission, aims to create a pan-European cohort built on existing and new large-scale population cohorts to help rapidly advance the knowledge related to the prevention of the SARS-CoV-2 infection and the management of COVID-19 and its long-term sequelae. The integration and analysis of the very heterogeneous health data pose the challenge of building an innovative technological infrastructure as the foundation of a dedicated framework for data management that should address the regulatory requirements such as the General Data Protection Regulation (GDPR). Methods: The three participating Supercomputing European Centres (CINECA - Italy, CINES - France and HLRS - Germany) designed and deployed a dedicated infrastructure to fulfil the functional requirements for data management to ensure sensitive biomedical data confidentiality/privacy, integrity, and security. Besides the technological issues, many methodological aspects have been considered: Berlin Institute of Health (BIH), Charité provided its expertise both for data protection, information security, and data harmonisation/standardisation. Results: The resulting infrastructure is based on a multi-layer approach that integrates several security measures to ensure data protection. A centralised Data Collection Platform has been established in the Italian National Hub while, for the use cases in which data sharing is not possible due to privacy restrictions, a distributed approach for Federated Analysis has been considered. A Data Portal is available as a centralised point of access for non-sensitive data and results, according to findability, accessibility, interoperability, and reusability (FAIR) data principles. This technological infrastructure has been used to support significative data exchange between population cohorts and to publish important scientific results related to SARS-CoV-2. Conclusions: Considering the increasing demand for data usage in accordance with the requirements of the GDPR regulations, the experience gained in the project and the infrastructure released for the ORCHESTRA project can act as a model to manage future public health threats. Other projects could benefit from the results achieved by ORCHESTRA by building upon the available standardisation of variables, design of the architecture, and process used for GDPR compliance.

8.
Front Med (Lausanne) ; 11: 1378866, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38818399

RESUMO

Introduction: The open-source software offered by the Observational Health Data Science and Informatics (OHDSI) collective, including the OMOP-CDM, serves as a major backbone for many real-world evidence networks and distributed health data analytics platforms. While container technology has significantly simplified deployments from a technical perspective, regulatory compliance can remain a major hurdle for the setup and operation of such platforms. In this paper, we present OHDSI-Compliance, a comprehensive set of document templates designed to streamline the data protection and information security-related documentation and coordination efforts required to establish OHDSI installations. Methods: To decide on a set of relevant document templates, we first analyzed the legal requirements and associated guidelines with a focus on the General Data Protection Regulation (GDPR). Moreover, we analyzed the software architecture of a typical OHDSI stack and related its components to the different general types of concepts and documentation identified. Then, we created those documents for a prototypical OHDSI installation, based on the so-called Broadsea package, following relevant guidelines from Germany. Finally, we generalized the documents by introducing placeholders and options at places where individual institution-specific content will be needed. Results: We present four documents: (1) a record of processing activities, (2) an information security concept, (3) an authorization concept, as well as (4) an operational concept covering the technical details of maintaining the stack. The documents are publicly available under a permissive license. Discussion: To the best of our knowledge, there are no other publicly available sets of documents designed to simplify the compliance process for OHDSI deployments. While our documents provide a comprehensive starting point, local specifics need to be added, and, due to the heterogeneity of legal requirements in different countries, further adoptions might be necessary.

9.
Artigo em Alemão | MEDLINE | ID: mdl-38753020

RESUMO

Healthcare-associated infections (HCAIs) represent an enormous burden for patients, healthcare workers, relatives and society worldwide, including Germany. The central tasks of infection prevention are recording and evaluating infections with the aim of identifying prevention potential and risk factors, taking appropriate measures and finally evaluating them. From an infection prevention perspective, it would be of great value if (i) the recording of infection cases was automated and (ii) if it were possible to identify particularly vulnerable patients and patient groups in advance, who would benefit from specific and/or additional interventions.To achieve this risk-adapted, individualized infection prevention, the RISK PRINCIPE research project develops algorithms and computer-based applications based on standardised, large datasets and incorporates expertise in the field of infection prevention.The project has two objectives: a) to develop and validate a semi-automated surveillance system for hospital-acquired bloodstream infections, prototypically for HCAI, and b) to use comprehensive patient data from different sources to create an individual or group-specific infection risk profile.RISK PRINCIPE is based on bringing together the expertise of medical informatics and infection medicine with a focus on hygiene and draws on information and experience from two consortia (HiGHmed and SMITH) of the German Medical Informatics Initiative (MII), which have been working on use cases in infection medicine for more than five years.


Assuntos
Infecção Hospitalar , Humanos , Algoritmos , Infecção Hospitalar/prevenção & controle , Infecção Hospitalar/epidemiologia , Alemanha/epidemiologia , Controle de Infecções/métodos , Controle de Infecções/normas , Vigilância da População/métodos , Medição de Risco/métodos , Fatores de Risco
10.
Artigo em Alemão | MEDLINE | ID: mdl-38753022

RESUMO

The interoperability Working Group of the Medical Informatics Initiative (MII) is the platform for the coordination of overarching procedures, data structures, and interfaces between the data integration centers (DIC) of the university hospitals and national and international interoperability committees. The goal is the joint content-related and technical design of a distributed infrastructure for the secondary use of healthcare data that can be used via the Research Data Portal for Health. Important general conditions are data privacy and IT security for the use of health data in biomedical research. To this end, suitable methods are used in dedicated task forces to enable procedural, syntactic, and semantic interoperability for data use projects. The MII core dataset was developed as several modules with corresponding information models and implemented using the HL7® FHIR® standard to enable content-related and technical specifications for the interoperable provision of healthcare data through the DIC. International terminologies and consented metadata are used to describe these data in more detail. The overall architecture, including overarching interfaces, implements the methodological and legal requirements for a distributed data use infrastructure, for example, by providing pseudonymized data or by federated analyses. With these results of the Interoperability Working Group, the MII is presenting a future-oriented solution for the exchange and use of healthcare data, the applicability of which goes beyond the purpose of research and can play an essential role in the digital transformation of the healthcare system.


Assuntos
Interoperabilidade da Informação em Saúde , Humanos , Conjuntos de Dados como Assunto , Registros Eletrônicos de Saúde , Alemanha , Interoperabilidade da Informação em Saúde/normas , Informática Médica , Registro Médico Coordenado/métodos , Integração de Sistemas
11.
JMIR Med Inform ; 12: e49646, 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38654577

RESUMO

Background: The SARS-CoV-2 pandemic has demonstrated once again that rapid collaborative research is essential for the future of biomedicine. Large research networks are needed to collect, share, and reuse data and biosamples to generate collaborative evidence. However, setting up such networks is often complex and time-consuming, as common tools and policies are needed to ensure interoperability and the required flows of data and samples, especially for handling personal data and the associated data protection issues. In biomedical research, pseudonymization detaches directly identifying details from biomedical data and biosamples and connects them using secure identifiers, the so-called pseudonyms. This protects privacy by design but allows the necessary linkage and reidentification. Objective: Although pseudonymization is used in almost every biomedical study, there are currently no pseudonymization tools that can be rapidly deployed across many institutions. Moreover, using centralized services is often not possible, for example, when data are reused and consent for this type of data processing is lacking. We present the ORCHESTRA Pseudonymization Tool (OPT), developed under the umbrella of the ORCHESTRA consortium, which faced exactly these challenges when it came to rapidly establishing a large-scale research network in the context of the rapid pandemic response in Europe. Methods: To overcome challenges caused by the heterogeneity of IT infrastructures across institutions, the OPT was developed based on programmable runtime environments available at practically every institution: office suites. The software is highly configurable and provides many features, from subject and biosample registration to record linkage and the printing of machine-readable codes for labeling biosample tubes. Special care has been taken to ensure that the algorithms implemented are efficient so that the OPT can be used to pseudonymize large data sets, which we demonstrate through a comprehensive evaluation. Results: The OPT is available for Microsoft Office and LibreOffice, so it can be deployed on Windows, Linux, and MacOS. It provides multiuser support and is configurable to meet the needs of different types of research projects. Within the ORCHESTRA research network, the OPT has been successfully deployed at 13 institutions in 11 countries in Europe and beyond. As of June 2023, the software manages data about more than 30,000 subjects and 15,000 biosamples. Over 10,000 labels have been printed. The results of our experimental evaluation show that the OPT offers practical response times for all major functionalities, pseudonymizing 100,000 subjects in 10 seconds using Microsoft Excel and in 54 seconds using LibreOffice. Conclusions: Innovative solutions are needed to make the process of establishing large research networks more efficient. The OPT, which leverages the runtime environment of common office suites, can be used to rapidly deploy pseudonymization and biosample management capabilities across research networks. The tool is highly configurable and available as open-source software.

12.
J Med Internet Res ; 26: e49445, 2024 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-38657232

RESUMO

BACKGROUND: Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set's statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice. OBJECTIVE: The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study. METHODS: The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case-specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results. RESULTS: Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy. CONCLUSIONS: Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case-specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data. TRIAL REGISTRATION: German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1093/ndt/gfr456.


Assuntos
Anonimização de Dados , Humanos , Insuficiência Renal Crônica/terapia , Disseminação de Informação/métodos , Algoritmos , Alemanha , Confidencialidade , Privacidade
13.
JMIR Med Inform ; 12: e53075, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38632712

RESUMO

Background: Pseudonymization has become a best practice to securely manage the identities of patients and study participants in medical research projects and data sharing initiatives. This method offers the advantage of not requiring the direct identification of data to support various research processes while still allowing for advanced processing activities, such as data linkage. Often, pseudonymization and related functionalities are bundled in specific technical and organization units known as trusted third parties (TTPs). However, pseudonymization can significantly increase the complexity of data management and research workflows, necessitating adequate tool support. Common tasks of TTPs include supporting the secure registration and pseudonymization of patient and sample identities as well as managing consent. Objective: Despite the challenges involved, little has been published about successful architectures and functional tools for implementing TTPs in large university hospitals. The aim of this paper is to fill this research gap by describing the software architecture and tool set developed and deployed as part of a TTP established at Charité - Universitätsmedizin Berlin. Methods: The infrastructure for the TTP was designed to provide a modular structure while keeping maintenance requirements low. Basic functionalities were realized with the free MOSAIC tools. However, supporting common study processes requires implementing workflows that span different basic services, such as patient registration, followed by pseudonym generation and concluded by consent collection. To achieve this, an integration layer was developed to provide a unified Representational state transfer (REST) application programming interface (API) as a basis for more complex workflows. Based on this API, a unified graphical user interface was also implemented, providing an integrated view of information objects and workflows supported by the TTP. The API was implemented using Java and Spring Boot, while the graphical user interface was implemented in PHP and Laravel. Both services use a shared Keycloak instance as a unified management system for roles and rights. Results: By the end of 2022, the TTP has already supported more than 10 research projects since its launch in December 2019. Within these projects, more than 3000 identities were stored, more than 30,000 pseudonyms were generated, and more than 1500 consent forms were submitted. In total, more than 150 people regularly work with the software platform. By implementing the integration layer and the unified user interface, together with comprehensive roles and rights management, the effort for operating the TTP could be significantly reduced, as personnel of the supported research projects can use many functionalities independently. Conclusions: With the architecture and components described, we created a user-friendly and compliant environment for supporting research projects. We believe that the insights into the design and implementation of our TTP can help other institutions to efficiently and effectively set up corresponding structures.

14.
Artigo em Alemão | MEDLINE | ID: mdl-38639817

RESUMO

BACKGROUND: The digitalization in the healthcare sector promises a secondary use of patient data in the sense of a learning healthcare system. For this, the Medical Informatics Initiative's (MII) Consent Working Group has created an ethical and legal basis with standardized consent documents. This paper describes the systematically monitored introduction of these documents at the MII sites. METHODS: The monitoring of the introduction included regular online surveys, an in-depth analysis of the introduction processes at selected sites, and an assessment of the documents in use. In addition, inquiries and feedback from a large number of stakeholders were evaluated. RESULTS: The online surveys showed that 27 of the 32 sites have gradually introduced the consent documents productively, with a current total of 173,289 consents. The analysis of the implementation procedures revealed heterogeneous organizational conditions at the sites. The requirements of various stakeholders were met by developing and providing supplementary versions of the consent documents and additional information materials. DISCUSSION: The introduction of the MII consent documents at the university hospitals creates a uniform legal basis for the secondary use of patient data. However, the comprehensive implementation within the sites remains challenging. Therefore, minimum requirements for patient information and supplementary recommendations for best practice must be developed. The further development of the national legal framework for research will not render the participation and transparency mechanisms developed here obsolete.


Assuntos
Consentimento Livre e Esclarecido , Alemanha , Consentimento Livre e Esclarecido/legislação & jurisprudência , Consentimento Livre e Esclarecido/normas , Humanos , Registros Eletrônicos de Saúde/legislação & jurisprudência , Registros Eletrônicos de Saúde/normas , Termos de Consentimento/normas , Termos de Consentimento/legislação & jurisprudência , Programas Nacionais de Saúde/legislação & jurisprudência
15.
Artigo em Alemão | MEDLINE | ID: mdl-38684526

RESUMO

Healthcare data are an important resource in applied medical research. They are available multicentrically. However, it remains a challenge to enable standardized data exchange processes between federal states and their individual laws and regulations. The Medical Informatics Initiative (MII) was founded in 2016 to implement processes that enable cross-clinic access to healthcare data in Germany. Several working groups (WGs) have been set up to coordinate standardized data structures (WG Interoperability), patient information and declarations of consent (WG Consent), and regulations on data exchange (WG Data Sharing). Here we present the most important results of the Data Sharing working group, which include agreed terms of use, legal regulations, and data access processes. They are already being implemented by the established Data Integration Centers (DIZ) and Use and Access Committees (UACs). We describe the services that are necessary to provide researchers with standardized data access. They are implemented with the Research Data Portal for Health, among others. Since the pilot phase, the processes of 385 active researchers have been used on this basis, which, as of April 2024, has resulted in 19 registered projects and 31 submitted research applications.


Assuntos
Registros Eletrônicos de Saúde , Disseminação de Informação , Humanos , Pesquisa Biomédica , Registros Eletrônicos de Saúde/estatística & dados numéricos , Alemanha , Pesquisa sobre Serviços de Saúde , Informática Médica , Registro Médico Coordenado/métodos , Modelos Organizacionais
16.
Artigo em Alemão | MEDLINE | ID: mdl-38175194

RESUMO

The increasing digitization of the healthcare system is leading to a growing volume of health data. Leveraging this data beyond its initial collection purpose for secondary use can provide valuable insights into diagnostics, treatment processes, and the quality of care. The Health Data Lab (HDL) will provide infrastructure for this purpose. Both the protection of patient privacy and optimal analytical capabilities are of central importance in this context, and artificial intelligence (AI) provides two opportunities. First, it enables the analysis of large volumes of data with flexible models, which means that hidden correlations and patterns can be discovered. Second, synthetic - that is, artificial - data generated by AI can protect privacy.This paper describes the KI-FDZ project, which aims to investigate innovative technologies that can support the secure provision of health data for secondary research purposes. A multi-layered approach is investigated in which data-level measures can be combined in different ways with processing in secure environments. To this end, anonymization and synthetization methods, among others, are evaluated based on two concrete application examples. Moreover, it is examined how the creation of machine learning pipelines and the execution of AI algorithms can be supported in secure processing environments. Preliminary results indicate that this approach can achieve a high level of protection while maintaining data validity. The approach investigated in the project can be an important building block in the secure secondary use of health data.


Assuntos
Algoritmos , Inteligência Artificial , Humanos , Alemanha , Atenção à Saúde
17.
JMIR Res Protoc ; 12: e46471, 2023 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-37566443

RESUMO

BACKGROUND: The anonymization of Common Data Model (CDM)-converted EHR data is essential to ensure the data privacy in the use of harmonized health care data. However, applying data anonymization techniques can significantly affect many properties of the resulting data sets and thus biases research results. Few studies have reviewed these applications with a reflection of approaches to manage data utility and quality concerns in the context of CDM-formatted health care data. OBJECTIVE: Our intended scoping review aims to identify and describe (1) how formal anonymization methods are carried out with CDM-converted health care data, (2) how data quality and utility concerns are considered, and (3) how the various CDMs differ in terms of their suitability for recording anonymized data. METHODS: The planned scoping review is based on the framework of Arksey and O'Malley. By using this, only articles published in English will be included. The retrieval of literature items should be based on a literature search string combining keywords related to data anonymization, CDM standards, and data quality assessment. The proposed literature search query should be validated by a librarian, accompanied by manual searches to include further informal sources. Eligible articles will first undergo a deduplication step, followed by the screening of titles. Second, a full-text reading will allow the 2 reviewers involved to reach the final decision about article selection, while a domain expert will support the resolution of citation selection conflicts. Additionally, key information will be extracted, categorized, summarized, and analyzed by using a proposed template into an iterative process. Tabular and graphical analyses should be addressed in alignment with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) checklist. We also performed some tentative searches on Web of Science for estimating the feasibility of reaching eligible articles. RESULTS: Tentative searches on Web of Science resulted in 507 nonduplicated matches, suggesting the availability of (potential) relevant articles. Further analysis and selection steps will allow us to derive a final literature set. Furthermore, the completion of this scoping review study is expected by the end of the fourth quarter of 2023. CONCLUSIONS: Outlining the approaches of applying formal anonymization methods on CDM-formatted health care data while taking into account data quality and utility concerns should provide useful insights to understand the existing approaches and future research direction based on identified gaps. This protocol describes a schedule to perform a scoping review, which should support the conduction of follow-up investigations. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): PRR1-10.2196/46471.

18.
Nat Commun ; 14(1): 2577, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37142591

RESUMO

Access to large volumes of so-called whole-slide images-high-resolution scans of complete pathological slides-has become a cornerstone of the development of novel artificial intelligence methods in pathology for diagnostic use, education/training of pathologists, and research. Nevertheless, a methodology based on risk analysis for evaluating the privacy risks associated with sharing such imaging data and applying the principle "as open as possible and as closed as necessary" is still lacking. In this article, we develop a model for privacy risk analysis for whole-slide images which focuses primarily on identity disclosure attacks, as these are the most important from a regulatory perspective. We introduce a taxonomy of whole-slide images with respect to privacy risks and mathematical model for risk assessment and design . Based on this risk assessment model and the taxonomy, we conduct a series of experiments to demonstrate the risks using real-world imaging data. Finally, we develop guidelines for risk assessment and recommendations for low-risk sharing of whole-slide image data.


Assuntos
Inteligência Artificial , Privacidade , Processamento de Imagem Assistida por Computador/métodos , Diagnóstico por Imagem/métodos
19.
Stud Health Technol Inform ; 302: 691-695, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203471

RESUMO

Making health data available for secondary use enables innovative data-driven medical research. Since modern machine learning (ML) methods and precision medicine require extensive amounts of data covering most of the standard and edge cases, it is essential to initially acquire large datasets. This can typically only be achieved by integrating different datasets from various sources and sharing data across sites. To obtain a unified dataset from heterogeneous sources, standard representations and Common Data Models (CDM) are needed. The process of mapping data into these standardized representations is usually very tedious and requires many manual configuration and refinement steps. A potential way to reduce these efforts is to use ML methods not only for data analysis, but also for the integration of health data on the syntactic, structural, and semantic level. However, research on ML-based medical data integration is still in its infancy. In this article, we describe the current state of the literature and present selected methods that appear to have a particularly high potential to improve medical data integration. Moreover, we discuss open issues and possible future research directions.


Assuntos
Pesquisa Biomédica , Aprendizado de Máquina , Semântica
20.
Stud Health Technol Inform ; 302: 28-32, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203603

RESUMO

Data sharing provides benefits in terms of transparency and innovation. Privacy concerns in this context can be addressed by anonymization techniques. In our study, we evaluated anonymization approaches which transform structured data in a real-world scenario of a chronic kidney disease cohort study and checked for replicability of research results via 95% CI overlap in two differently anonymized datasets with different protection degrees. Calculated 95% CI overlapped in both applied anonymization approaches and visual comparison presented similar results. Thus, in our use case scenario, research results were not relevantly impacted by anonymization, which adds to the growing evidence of utility-preserving anonymization techniques.


Assuntos
Anonimização de Dados , Privacidade , Humanos , Estudos de Coortes , Disseminação de Informação , Organizações
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA