RESUMO
Artificial intelligence (AI) is revolutionizing the field of medical imaging, holding the potential to shift medicine from a reactive "sick-care" approach to a proactive focus on healthcare and prevention. The successful development of AI in this domain relies on access to large, comprehensive, and standardized real-world datasets that accurately represent diverse populations and diseases. However, images and data are sensitive, and as such, before using them in any way the data needs to be modified to protect the privacy of the patients. This paper explores the approaches in the domain of five EU projects working on the creation of ethically compliant and GDPR-regulated European medical imaging platforms, focused on cancer-related data. It presents the individual approaches to the de-identification of imaging data, and describes the problems and the solutions adopted in each case. Further, lessons learned are provided, enabling future projects to optimally handle the problem of data de-identification. CRITICAL RELEVANCE STATEMENT: This paper presents key approaches from five flagship EU projects for the de-identification of imaging and clinical data offering valuable insights and guidelines in the domain. KEY POINTS: ΑΙ models for health imaging require access to large amounts of data. Access to large imaging datasets requires an appropriate de-identification process. This paper provides de-identification guidelines from the AI for health imaging (AI4HI) projects.
RESUMO
Purpose: Previous studies have demonstrated that three-dimensional (3D) volumetric renderings of magnetic resonance imaging (MRI) brain data can be used to identify patients using facial recognition. We have shown that facial features can be identified on simulation-computed tomography (CT) images for radiation oncology and mapped to face images from a database. We aim to determine whether CT images can be anonymized using anonymization software that was designed for T1-weighted MRI data. Approach: Our study examines (1) the ability of off-the-shelf anonymization algorithms to anonymize CT data and (2) the ability of facial recognition algorithms to identify whether faces could be detected from a database of facial images. Our study generated 3D renderings from 57 head CT scans from The Cancer Imaging Archive database. Data were anonymized using AFNI (deface, reface, and 3Dskullstrip) and FSL's BET. Anonymized data were compared to the original renderings and passed through facial recognition algorithms (VGG-Face, FaceNet, DLib, and SFace) using a facial database (labeled faces in the wild) to determine what matches could be found. Results: Our study found that all modules were able to process CT data and that AFNI's 3Dskullstrip and FSL's BET data consistently showed lower reidentification rates compared to the original. Conclusions: The results from this study highlight the potential usage of anonymization algorithms as a clinical standard for deidentifying brain CT data. Our study demonstrates the importance of continued vigilance for patient privacy in publicly shared datasets and the importance of continued evaluation of anonymization methods for CT data.
RESUMO
Artificial intelligence (AI) is transforming the field of medical imaging and has the potential to bring medicine from the era of 'sick-care' to the era of healthcare and prevention. The development of AI requires access to large, complete, and harmonized real-world datasets, representative of the population, and disease diversity. However, to date, efforts are fragmented, based on single-institution, size-limited, and annotation-limited datasets. Available public datasets (e.g., The Cancer Imaging Archive, TCIA, USA) are limited in scope, making model generalizability really difficult. In this direction, five European Union projects are currently working on the development of big data infrastructures that will enable European, ethically and General Data Protection Regulation-compliant, quality-controlled, cancer-related, medical imaging platforms, in which both large-scale data and AI algorithms will coexist. The vision is to create sustainable AI cloud-based platforms for the development, implementation, verification, and validation of trustable, usable, and reliable AI models for addressing specific unmet needs regarding cancer care provision. In this paper, we present an overview of the development efforts highlighting challenges and approaches selected providing valuable feedback to future attempts in the area.Key points⢠Artificial intelligence models for health imaging require access to large amounts of harmonized imaging data and metadata.⢠Main infrastructures adopted either collect centrally anonymized data or enable access to pseudonymized distributed data.⢠Developing a common data model for storing all relevant information is a challenge.⢠Trust of data providers in data sharing initiatives is essential.⢠An online European Union meta-tool-repository is a necessity minimizing effort duplication for the various projects in the area.
Assuntos
Inteligência Artificial , Neoplasias , Humanos , Diagnóstico por Imagem , Previsões , Big DataRESUMO
Objectives: Academic institutions have access to comprehensive sets of real-world data. However, their potential for secondary use-for example, in medical outcomes research or health care quality management-is often limited due to data privacy concerns. External partners could help achieve this potential, yet documented frameworks for such cooperation are lacking. Therefore, this work presents a pragmatic approach for enabling academic-industrial data partnerships in a health care environment. Methods: We employ a value-swapping strategy to facilitate data sharing. Using tumor documentation and molecular pathology data, we define a data-altering process as well as rules for an organizational pipeline that includes the technical anonymization process. Results: The resulting dataset was fully anonymized while still retaining the critical properties of the original data to allow for external development and the training of analytical algorithms. Conclusion: Value swapping is a pragmatic, yet powerful method to balance data privacy and requirements for algorithm development; therefore, it is well suited to enable academic-industrial data partnerships.
RESUMO
PURPOSE: Recent advances in Surgical Data Science (SDS) have contributed to an increase in video recordings from hospital environments. While methods such as surgical workflow recognition show potential in increasing the quality of patient care, the quantity of video data has surpassed the scale at which images can be manually anonymized. Existing automated 2D anonymization methods under-perform in Operating Rooms (OR), due to occlusions and obstructions. We propose to anonymize multi-view OR recordings using 3D data from multiple camera streams. METHODS: RGB and depth images from multiple cameras are fused into a 3D point cloud representation of the scene. We then detect each individual's face in 3D by regressing a parametric human mesh model onto detected 3D human keypoints and aligning the face mesh with the fused 3D point cloud. The mesh model is rendered into every acquired camera view, replacing each individual's face. RESULTS: Our method shows promise in locating faces at a higher rate than existing approaches. DisguisOR produces geometrically consistent anonymizations for each camera view, enabling more realistic anonymization that is less detrimental to downstream tasks. CONCLUSION: Frequent obstructions and crowding in operating rooms leaves significant room for improvement for off-the-shelf anonymization methods. DisguisOR addresses privacy on a scene level and has the potential to facilitate further research in SDS.
Assuntos
Salas Cirúrgicas , Humanos , Gravação em VídeoRESUMO
Background: Demand for head and neck cancer (HNC) radiotherapy data in algorithmic development has prompted increased image dataset sharing. Medical images must comply with data protection requirements so that re-use is enabled without disclosing patient identifiers. Defacing, i.e., the removal of facial features from images, is often considered a reasonable compromise between data protection and re-usability for neuroimaging data. While defacing tools have been developed by the neuroimaging community, their acceptability for radiotherapy applications have not been explored. Therefore, this study systematically investigated the impact of available defacing algorithms on HNC organs at risk (OARs). Methods: A publicly available dataset of magnetic resonance imaging scans for 55 HNC patients with eight segmented OARs (bilateral submandibular glands, parotid glands, level II neck lymph nodes, level III neck lymph nodes) was utilized. Eight publicly available defacing algorithms were investigated: afni_refacer, DeepDefacer, defacer, fsl_deface, mask_face, mri_deface, pydeface, and quickshear. Using a subset of scans where defacing succeeded (N=29), a 5-fold cross-validation 3D U-net based OAR auto-segmentation model was utilized to perform two main experiments: 1.) comparing original and defaced data for training when evaluated on original data; 2.) using original data for training and comparing the model evaluation on original and defaced data. Models were primarily assessed using the Dice similarity coefficient (DSC). Results: Most defacing methods were unable to produce any usable images for evaluation, while mask_face, fsl_deface, and pydeface were unable to remove the face for 29%, 18%, and 24% of subjects, respectively. When using the original data for evaluation, the composite OAR DSC was statistically higher (p ≤ 0.05) for the model trained with the original data with a DSC of 0.760 compared to the mask_face, fsl_deface, and pydeface models with DSCs of 0.742, 0.736, and 0.449, respectively. Moreover, the model trained with original data had decreased performance (p ≤ 0.05) when evaluated on the defaced data with DSCs of 0.673, 0.693, and 0.406 for mask_face, fsl_deface, and pydeface, respectively. Conclusion: Defacing algorithms may have a significant impact on HNC OAR auto-segmentation model training and testing. This work highlights the need for further development of HNC-specific image anonymization methods.
RESUMO
Despite great advances in brain tumor segmentation and clear clinical need, translation of state-of-the-art computational methods into clinical routine and scientific practice remains a major challenge. Several factors impede successful implementations, including data standardization and preprocessing. However, these steps are pivotal for the deployment of state-of-the-art image segmentation algorithms. To overcome these issues, we present BraTS Toolkit. BraTS Toolkit is a holistic approach to brain tumor segmentation and consists of three components: First, the BraTS Preprocessor facilitates data standardization and preprocessing for researchers and clinicians alike. It covers the entire image analysis workflow prior to tumor segmentation, from image conversion and registration to brain extraction. Second, BraTS Segmentor enables orchestration of BraTS brain tumor segmentation algorithms for generation of fully-automated segmentations. Finally, Brats Fusionator can combine the resulting candidate segmentations into consensus segmentations using fusion methods such as majority voting and iterative SIMPLE fusion. The capabilities of our tools are illustrated with a practical example to enable easy translation to clinical and scientific practice.
RESUMO
BACKGROUND: Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap. RESULTS: We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques. CONCLUSIONS: With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software.
Assuntos
Confidencialidade , Anonimização de Dados , Sistemas de Apoio a Decisões Clínicas , Modelos Estatísticos , Software , Pesquisa Biomédica , Humanos , Aprendizado de Máquina , Curva ROC , Reprodutibilidade dos TestesRESUMO
BACKGROUND: after the discovery of the antiretroviral therapy, life expectancy of HIV+ patient has become longer and this meant that he would start ageing. International literature demonstrated that the HIV+ patient is more fragile than any other person of the same age and that doesn't present the viral infection. OBJECTIVE: design, development and test of a new web-based instrument to allow the self-administration of the new questionnaire SELFY MPI created during the European project Effichronic. Materials & Methods: between June and September 2018, a group of senior 50 HIV+ patients, was involved. The questionnaire SELFY MPI enables to collect data about quality of life and cognitive functions. RESULTS: the developed web-instrument collects pseudo-anonymous data into the Liguria HIV Network database. The subsequent statistical analysis highlighted a correlation between the two outcomes of SELFY MPI and the laboratory exam's parameter TCD4+ and viral load. CONCLUSIONS: the great potentiality of this instrument is not only the support given to clinical research about the effects of HIV on chronical disease management but it can be also used as a follow-up instrument to evaluate different aspects of the geriatric patient life during the years.
Assuntos
Fragilidade , Infecções por HIV , Internet , Dados de Saúde Gerados pelo Paciente , Qualidade de Vida , Idoso , Idoso Fragilizado , HIV , Infecções por HIV/complicações , Humanos , Masculino , Inquéritos e QuestionáriosRESUMO
Technological advancements in recent years have sparked the use of large databases for research. The availability of these large databases has administered a need for anonymization and de-identification techniques, prior to publishing the data. This de-identification alters the data, which in turn can impact the results derived post de-identification and potentially lead to false conclusions. The objective of this study is to investigate if alterations to a de-identified time-to-event data set may improve the accuracy of the estimates. In this data set, a missing time bias was present among censored patients as a means to preserve patient confidentiality. This study investigates five methods intended to reduce the bias of time-to-event estimates. A simulation study was conducted to evaluate the effectiveness of each method in reducing bias. In situations where there was a large number of censored patients, the results of the simulation showed that Method 4 yielded the most accurate estimates. This method adjusted the survival times of censored patients by adding a random uniform component such that the modified survival time would occur within the final year of the study. Alternatively, when there was only a small number of censored patients, the method that did not alter the de-identified data set (Method 1) provided the most accurate estimates.
Assuntos
Anonimização de Dados , Análise de Sobrevida , Antineoplásicos/uso terapêutico , Viés , Confidencialidade , Interpretação Estatística de Dados , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Sistema de Registros , Resultado do TratamentoRESUMO
Translational clinical research is often characterized by a unidirectional information flow from clinical to molecular data by using phenotypes to elucidate molecular disease processes. Here we present the RESIST study which uses xenograft information for individual treatment decisions after resistance to a specific anticancer treatment establishing a bidirectional information flow between patient and molecular biology. The paper discusses the specific challenges related to the IT infrastructure for such bidirectional translational projects and proposes solutions. A specific focus is the safeguarding genomic privacy.
Assuntos
Anonimização de Dados , Privacidade Genética , Xenoenxertos , Informática Médica/métodos , Ensaios Clínicos como Assunto/normas , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/genética , Neoplasias Colorretais/cirurgia , Humanos , Pesquisa Translacional Biomédica/métodos , Pesquisa Translacional Biomédica/organização & administraçãoRESUMO
Since the early 2000's, much of the neuroimaging work at Washington University (WU) has been facilitated by the Central Neuroimaging Data Archive (CNDA), an XNAT-based imaging informatics system. The CNDA is uniquely related to XNAT, as it served as the original codebase for the XNAT open source platform. The CNDA hosts data acquired in over 1000 research studies, encompassing 36,000 subjects and more than 60,000 imaging sessions. Most imaging modalities used in modern human research are represented in the CNDA, including magnetic resonance (MR), positron emission tomography (PET), computed tomography (CT), nuclear medicine (NM), computed radiography (CR), digital radiography (DX), and ultrasound (US). However, the majority of the imaging data in the CNDA are MR and PET of the human brain. Currently, about 20% of the total imaging data in the CNDA is available by request to external researchers. CNDA's available data includes large sets of imaging sessions and in some cases clinical, psychometric, tissue, or genetic data acquired in the study of Alzheimer's disease, brain metabolism, cancer, HIV, sickle cell anemia, and Tourette syndrome.