RESUMEN
BACKGROUND: Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined. METHODS: This study investigates anonymization's impact on pathological speech across over 2700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods. RESULTS: We document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experience minimal utility changes, while Dysglossia shows slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis reveals consistent anonymization effects across most of the demographics. CONCLUSIONS: This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks.
When someone's way of speaking is disrupted due to health issues, making it hard for them to communicate clearly, it is described as pathological speech. Our study explores whether this type of speech can be modified to protect patient privacy without losing its ability to help diagnose health conditions. We evaluated automatic anonymization for over 2,700 speakers. The results show that these methods can substantially enhance privacy while still maintaining the usefulness of speech in medical diagnostics. This means we can keep speech data private whilst still being able to use it to identify health issues. However, our results show the effectiveness of these methods can vary depending on the specific condition being diagnosed. Our study provides a method that can help maintain patient privacy, whilst highlighting that further customized approaches will be required to ensure optimal privacy.
RESUMEN
Ultrasound examinations during pregnancy can detect abnormal fetal development, which is a leading cause of perinatal mortality. In multiple pregnancies, the position of the fetuses may change between examinations. The individual fetus cannot be clearly identified. Fetal re-identification may improve diagnostic capabilities by tracing individual fetal changes. This work evaluates the feasibility of fetal re-identification on FETAL_PLANES_DB, a publicly available dataset of singleton pregnancy ultrasound images. Five dataset subsets with 6,491 images from 1,088 pregnant women and two re-identification frameworks (Torchreid, FastReID) are evaluated. FastReID achieves a mean average precision of 68.77% (68.42%) and mean precision at rank 10 score of 89.60% (95.55%) when trained on images showing the fetal brain (abdomen). Visualization with gradient-weighted class activation mapping shows that the classifiers appear to rely on anatomical features. We conclude that fetal re-identification in ultrasound images may be feasible. However, more work on additional datasets, including images from multiple pregnancies and several subsequent examinations, is required to ensure and investigate performance stability and explainability.Clinical relevance- To date, fetuses in multiple pregnancies cannot be distinguished between ultrasound examinations. This work provides the first evidence for feasibility of fetal re-identification in pregnancy ultrasound images. This may improve diagnostic capabilities in clinical practice in the future, such as longitudinal analysis of fetal changes or abnormalities.
Asunto(s)
Aprendizaje Profundo , Ultrasonografía Prenatal , Embarazo , Humanos , Femenino , Ultrasonografía Prenatal/métodos , Feto/diagnóstico por imagen , Embarazo Múltiple , UltrasonografíaRESUMEN
The evaluation of deep-learning (DL) systems typically relies on the Area under the Receiver-Operating-Curve (AU-ROC) as a performance metric. However, AU-ROC, in its holistic form, does not sufficiently consider performance within specific ranges of sensitivity and specificity, which are critical for the intended operational context of the system. Consequently, two systems with identical AU-ROC values can exhibit significantly divergent real-world performance. This issue is particularly pronounced in the context of anomaly detection tasks, a commonly employed application of DL systems across various research domains, including medical imaging, industrial automation, manufacturing, cyber security, fraud detection, and drug research, among others. The challenge arises from the heavy class imbalance in training datasets, with the abnormality class often incurring a considerably higher misclassification cost compared to the normal class. Traditional DL systems address this by adjusting the weighting of the cost function or optimizing for specific points along the ROC curve. While these approaches yield reasonable results in many cases, they do not actively seek to maximize performance for the desired operating point. In this study, we introduce a novel technique known as AUCReshaping, designed to reshape the ROC curve exclusively within the specified sensitivity and specificity range, by optimizing sensitivity at a predetermined specificity level. This reshaping is achieved through an adaptive and iterative boosting mechanism that allows the network to focus on pertinent samples during the learning process. We primarily investigated the impact of AUCReshaping in the context of abnormality detection tasks, specifically in Chest X-Ray (CXR) analysis, followed by breast mammogram and credit card fraud detection tasks. The results reveal a substantial improvement, ranging from 2 to 40%, in sensitivity at high-specificity levels for binary classification tasks.
Asunto(s)
Algoritmos , Mamografía , Sensibilidad y Especificidad , Curva ROC , RadiografíaRESUMEN
With the rise and ever-increasing potential of deep learning techniques in recent years, publicly available medical datasets became a key factor to enable reproducible development of diagnostic algorithms in the medical domain. Medical data contains sensitive patient-related information and is therefore usually anonymized by removing patient identifiers, e.g., patient names before publication. To the best of our knowledge, we are the first to show that a well-trained deep learning system is able to recover the patient identity from chest X-ray data. We demonstrate this using the publicly available large-scale ChestX-ray14 dataset, a collection of 112,120 frontal-view chest X-ray images from 30,805 unique patients. Our verification system is able to identify whether two frontal chest X-ray images are from the same person with an AUC of 0.9940 and a classification accuracy of 95.55%. We further highlight that the proposed system is able to reveal the same person even ten and more years after the initial scan. When pursuing a retrieval approach, we observe an mAP@R of 0.9748 and a precision@1 of 0.9963. Furthermore, we achieve an AUC of up to 0.9870 and a precision@1 of up to 0.9444 when evaluating our trained networks on external datasets such as CheXpert and the COVID-19 Image Data Collection. Based on this high identification rate, a potential attacker may leak patient-related information and additionally cross-reference images to obtain more information. Thus, there is a great risk of sensitive content falling into unauthorized hands or being disseminated against the will of the concerned patients. Especially during the COVID-19 pandemic, numerous chest X-ray datasets have been published to advance research. Therefore, such data may be vulnerable to potential attacks by deep learning-based re-identification algorithms.