Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 272
Filtrar
1.
Front Plant Sci ; 15: 1360113, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39351023

RESUMEN

The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.

2.
BMC Pregnancy Childbirth ; 24(1): 628, 2024 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-39354367

RESUMEN

OBJECTIVE: This study introduces the complete blood count (CBC), a standard prenatal screening test, as a biomarker for diagnosing preeclampsia with severe features (sPE), employing machine learning models. METHODS: We used a boosting machine learning model fed with synthetic data generated through a new methodology called DAS (Data Augmentation and Smoothing). Using data from a Brazilian study including 132 pregnant women, we generated 3,552 synthetic samples for model training. To improve interpretability, we also provided a ridge regression model. RESULTS: Our boosting model obtained an AUROC of 0.90±0.10, sensitivity of 0.95, and specificity of 0.79 to differentiate sPE and non-PE pregnant women, using CBC parameters of neutrophils count, mean corpuscular hemoglobin (MCH), and the aggregate index of systemic inflammation (AISI). In addition, we provided a ridge regression equation using the same three CBC parameters, which is fully interpretable and achieved an AUROC of 0.79±0.10 to differentiate the both groups. Moreover, we also showed that a monocyte count lower than 490 / m m 3 yielded a sensitivity of 0.71 and specificity of 0.72. CONCLUSION: Our study showed that ML-powered CBC could be used as a biomarker for sPE diagnosis support. In addition, we showed that a low monocyte count alone could be an indicator of sPE. SIGNIFICANCE: Although preeclampsia has been extensively studied, no laboratory biomarker with favorable cost-effectiveness has been proposed. Using artificial intelligence, we proposed to use the CBC, a low-cost, fast, and well-spread blood test, as a biomarker for sPE.


Asunto(s)
Biomarcadores , Aprendizaje Automático , Preeclampsia , Humanos , Preeclampsia/diagnóstico , Preeclampsia/sangre , Femenino , Embarazo , Biomarcadores/sangre , Recuento de Células Sanguíneas/métodos , Adulto , Sensibilidad y Especificidad , Brasil , Índice de Severidad de la Enfermedad , Curva ROC , Diagnóstico Prenatal/métodos
3.
BMC Med Res Methodol ; 24(1): 198, 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39251921

RESUMEN

In settings requiring synthetic data generation based on a clinical cohort, e.g., due to data protection regulations, heterogeneity across individuals might be a nuisance that we need to control or faithfully preserve. The sources of such heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and thus reflected only in properties of distributions, such as bimodality or skewness. We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique that utilizes a low-dimensional latent representation. To faithfully reproduce unknown heterogeneity reflected in marginal distributions, we propose to combine VAEs with pre-transformations. For dealing with known heterogeneity due to sub-groups, we complement VAEs with models for group membership, specifically from propensity score regression. The evaluation is performed with a realistic simulation design that features sub-groups and challenging marginal distributions. The proposed approach faithfully recovers the latter, compared to synthetic data approaches that focus purely on marginal distributions. Propensity scores add complementary information, e.g., when visualized in the latent space, and enable sampling of synthetic data with or without sub-group specific characteristics. We also illustrate the proposed approach with real data from an international stroke trial that exhibits considerable distribution differences between study sites, in addition to bimodality. These results indicate that describing heterogeneity by statistical approaches, such as propensity score regression, might be more generally useful for complementing generative deep learning for obtaining synthetic data that faithfully reflects structure from clinical cohorts.


Asunto(s)
Puntaje de Propensión , Humanos , Aprendizaje Profundo , Algoritmos , Simulación por Computador
4.
Med Image Anal ; 99: 103344, 2024 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-39265361

RESUMEN

Significant diagnostic variability between and within observers persists in pathology, despite the fact that digital slide images provide the ability to measure and quantify features much more precisely compared to conventional methods. Automated and accurate segmentation of cancerous cell and tissue regions can streamline the diagnostic process, providing insights into the cancer progression, and helping experts decide on the most effective treatment. Here, we evaluate the performance of the proposed PathoSeg model, with an architecture comprising of a modified HRNet encoder and a UNet++ decoder integrated with a CBAM block to utilize attention mechanism for an improved segmentation capability. We demonstrate that PathoSeg outperforms the current state-of-the-art (SOTA) networks in both quantitative and qualitative assessment of instance and semantic segmentation. Notably, we leverage the use of synthetic data generated by PathopixGAN, which effectively addresses the data imbalance problem commonly encountered in histopathology datasets, further improving the performance of PathoSeg. It utilizes spatially adaptive normalization within a generative and discriminative mechanism to synthesize diverse histopathological environments dictated through semantic information passed through pixel-level annotated Ground Truth semantic masks.Besides, we contribute to the research community by providing an in-house dataset that includes semantically segmented masks for breast carcinoma tubules (BCT), micro/macrovesicular steatosis of the liver (MSL), and prostate carcinoma glands (PCG). In the first part of the dataset, we have a total of 14 whole slide images from 13 patients' liver, with fat cell segmented masks, totaling 951 masks of size 512 × 512 pixels. In the second part, it includes 17 whole slide images from 13 patients with prostate carcinoma gland segmentation masks, amounting to 30,000 masks of size 512 × 512 pixels. In the third part, the dataset contains 51 whole slides from 36 patients, with breast carcinoma tubule masks totaling 30,000 masks of size 512 × 512 pixels. To ensure transparency and encourage further research, we will make this dataset publicly available for non-commercial and academic purposes. To facilitate reproducibility and encourage further research, we will also make our code and pre-trained models publicly available at https://github.com/DeepMIALab/PathoSeg.

6.
Stud Health Technol Inform ; 317: 270-279, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39234731

RESUMEN

INTRODUCTION: A modern approach to ensuring privacy when sharing datasets is the use of synthetic data generation methods, which often claim to outperform classic anonymization techniques in the trade-off between data utility and privacy. Recently, it was demonstrated that various deep learning-based approaches are able to generate useful synthesized datasets, often based on domain-specific analyses. However, evaluating the privacy implications of releasing synthetic data remains a challenging problem, especially when the goal is to conform with data protection guidelines. METHODS: Therefore, the recent privacy risk quantification framework Anonymeter has been built for evaluating multiple possible vulnerabilities, which are specifically based on privacy risks that are considered by the European Data Protection Board, i.e. singling out, linkability, and attribute inference. This framework was applied to a synthetic data generation study from the epidemiological domain, where the synthesization replicates time and age trends previously found in data collected during the DONALD cohort study (1312 participants, 16 time points). The conducted privacy analyses are presented, which place a focus on the vulnerability of outliers. RESULTS: The resulting privacy scores are discussed, which vary greatly between the different types of attacks. CONCLUSION: Challenges encountered during their implementation and during the interpretation of their results are highlighted, and it is concluded that privacy risk assessment for synthetic data remains an open problem.


Asunto(s)
Seguridad Computacional , Medición de Riesgo , Humanos , Estudios Longitudinales , Confidencialidad , Privacidad
7.
Physiol Meas ; 45(5)2024 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-39150768

RESUMEN

Objective.Cardiovascular diseases are a major cause of mortality globally, and electrocardiograms (ECGs) are crucial for diagnosing them. Traditionally, ECGs are stored in printed formats. However, these printouts, even when scanned, are incompatible with advanced ECG diagnosis software that require time-series data. Digitizing ECG images is vital for training machine learning models in ECG diagnosis, leveraging the extensive global archives collected over decades. Deep learning models for image processing are promising in this regard, although the lack of clinical ECG archives with reference time-series data is challenging. Data augmentation techniques using realistic generative data models provide a solution.Approach.We introduceECG-Image-Kit, an open-source toolbox for generating synthetic multi-lead ECG images with realistic artifacts from time-series data, aimed at automating the conversion of scanned ECG images to ECG data points. The tool synthesizes ECG images from real time-series data, applying distortions like text artifacts, wrinkles, and creases on a standard ECG paper background.Main results.As a case study, we used ECG-Image-Kit to create a dataset of 21 801 ECG images from the PhysioNet QT database. We developed and trained a combination of a traditional computer vision and deep neural network model on this dataset to convert synthetic images into time-series data for evaluation. We assessed digitization quality by calculating the signal-to-noise ratio and compared clinical parameters like QRS width, RR, and QT intervals recovered from this pipeline, with the ground truth extracted from ECG time-series. The results show that this deep learning pipeline accurately digitizes paper ECGs, maintaining clinical parameters, and highlights a generative approach to digitization.Significance.The toolbox has broad applications, including model development for ECG image digitization and classification. The toolbox currently supports data augmentation for the 2024 PhysioNet Challenge, focusing on digitizing and classifying paper ECG images.


Asunto(s)
Aprendizaje Profundo , Electrocardiografía , Procesamiento de Imagen Asistido por Computador , Procesamiento de Imagen Asistido por Computador/métodos , Humanos , Procesamiento de Señales Asistido por Computador , Artefactos , Programas Informáticos
8.
BMC Med Res Methodol ; 24(1): 181, 2024 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-39143466

RESUMEN

BACKGROUND: Synthetic Electronic Health Records (EHRs) are becoming increasingly popular as a privacy enhancing technology. However, for longitudinal EHRs specifically, little research has been done into how to properly evaluate synthetically generated samples. In this article, we provide a discussion on existing methods and recommendations when evaluating the quality of synthetic longitudinal EHRs. METHODS: We recommend to assess synthetic EHR quality through similarity to real EHRs in low-dimensional projections, accuracy of a classifier discriminating synthetic from real samples, performance of synthetic versus real trained algorithms in clinical tasks, and privacy risk through risk of attribute inference. For each metric we discuss strengths and weaknesses, next to showing how it can be applied on a longitudinal dataset. RESULTS: To support the discussion on evaluation metrics, we apply discussed metrics on a dataset of synthetic EHRs generated from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) repository. CONCLUSIONS: The discussion on evaluation metrics provide guidance for researchers on how to use and interpret different metrics when evaluating the quality of synthetic longitudinal EHRs.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Registros Electrónicos de Salud/estadística & datos numéricos , Registros Electrónicos de Salud/normas , Humanos , Estudios Longitudinales , Privacidad
9.
Front Bioeng Biotechnol ; 12: 1360330, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39188371

RESUMEN

There is increasing evidence that coronary artery wall shear stress (WSS) measurement provides useful prognostic information that allows prediction of adverse cardiovascular events. Computational Fluid Dynamics (CFD) has been extensively used in research to measure vessel physiology and examine the role of the local haemodynamic forces on the evolution of atherosclerosis. Nonetheless, CFD modelling remains computationally expensive and time-consuming, making its direct use in clinical practice inconvenient. A number of studies have investigated the use of deep learning (DL) approaches for fast WSS prediction. However, in these reports, patient data were limited and most of them used synthetic data generation methods for developing the training set. In this paper, we implement 2 approaches for synthetic data generation and combine their output with real patient data in order to train a DL model with a U-net architecture for prediction of WSS in the coronary arteries. The model achieved 6.03% Normalised Mean Absolute Error (NMAE) with inference taking only 0.35 s; making this solution time-efficient and clinically relevant.

10.
Comput Biol Med ; 180: 108943, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39096611

RESUMEN

Gait analysis has proven to be a key process in the functional assessment of people involving many fields, such as diagnosis of diseases or rehabilitation, and has increased in relevance lately. Gait analysis often requires gathering data, although this can be very expensive and time consuming. One of the main solutions applied in fields when data acquisition is a problem is augmentation of datasets with artificial data. There are two main approaches for doing that: simulation and synthetic data generation. In this article, we propose a parametrizable generative system of synthetic walking simplified human skeletons. For achieving that, a data gathering experiment with up to 26 individuals was conducted. The system consists of two artificial neural networks: a recurrent neural network for the generation of the movement and a multilayer perceptron for determining the size of the segments of the skeletons. The system has been evaluated through four processes: (i) an observational appraisal by researchers in gait analysis, (ii) a visual representation of the distribution of the generated data, (iii) a numerical analysis using the normalized cross-correlation coefficient, and (iv) an angular evaluation to check the kinematic validity of the data. The evaluation concluded that the system is able to generate realistic and accurate gait data. These results reveal a promising path for this research field, which can be further improved through increasing the variety of movements and the user sample.


Asunto(s)
Redes Neurales de la Computación , Humanos , Marcha/fisiología , Modelos Biológicos , Fenómenos Biomecánicos/fisiología , Masculino , Caminata/fisiología , Femenino
11.
Brain Sci ; 14(8)2024 Aug 17.
Artículo en Inglés | MEDLINE | ID: mdl-39199519

RESUMEN

(1) Background: Functional magnetic resonance imaging (fMRI) utilizing multi-echo gradient echo-planar imaging (ME-GE-EPI) has demonstrated higher sensitivity and stability compared to utilizing single-echo gradient echo-planar imaging (SE-GE-EPI). The direct derivation of T2* maps from fitting multi-echo data enables accurate recording of dynamic functional changes in the brain, exhibiting higher sensitivity than echo combination maps. However, the widely employed voxel-wise log-linear fitting is susceptible to inevitable noise accumulation during image acquisition. (2) Methods: This work introduced a synthetic data-driven deep learning (SD-DL) method to obtain T2* maps for multi-echo (ME) fMRI analysis. (3) Results: The experimental results showed the efficient enhancement of the temporal signal-to-noise ratio (tSNR), improved task-based blood oxygen level-dependent (BOLD) percentage signal change, and enhanced performance in multi-echo independent component analysis (MEICA) using the proposed method. (4) Conclusion: T2* maps derived from ME-fMRI data using the proposed SD-DL method exhibit enhanced BOLD sensitivity in comparison to T2* maps derived from the LLF method.

12.
Comput Struct Biotechnol J ; 23: 2892-2910, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39108677

RESUMEN

Synthetic data generation has emerged as a promising solution to overcome the challenges which are posed by data scarcity and privacy concerns, as well as, to address the need for training artificial intelligence (AI) algorithms on unbiased data with sufficient sample size and statistical power. Our review explores the application and efficacy of synthetic data methods in healthcare considering the diversity of medical data. To this end, we systematically searched the PubMed and Scopus databases with a great focus on tabular, imaging, radiomics, time-series, and omics data. Studies involving multi-modal synthetic data generation were also explored. The type of method used for the synthetic data generation process was identified in each study and was categorized into statistical, probabilistic, machine learning, and deep learning. Emphasis was given to the programming languages used for the implementation of each method. Our evaluation revealed that the majority of the studies utilize synthetic data generators to: (i) reduce the cost and time required for clinical trials for rare diseases and conditions, (ii) enhance the predictive power of AI models in personalized medicine, (iii) ensure the delivery of fair treatment recommendations across diverse patient populations, and (iv) enable researchers to access high-quality, representative multimodal datasets without exposing sensitive patient information, among others. We underline the wide use of deep learning based synthetic data generators in 72.6 % of the included studies, with 75.3 % of the generators being implemented in Python. A thorough documentation of open-source repositories is finally provided to accelerate research in the field.

13.
Sci Rep ; 14(1): 19064, 2024 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-39154144

RESUMEN

This study addresses challenges related to privacy issues in utilizing medical data, particularly the protection of personal information. To overcome this obstacle, the research focuses on data synthesis using real-world time-series generative adversarial networks (RTSGAN). A total of 53,005 data were synthesized using the dataset of 15,799 patients with colorectal cancer. The results of the quantitative evaluation of the synthetic data's quality are as follows: the Hellinger distance ranged from 0 to 0.25; the train on synthetic, test on real (TSTR) and train on real, test on synthetic (TRTS) results showed an average area under the curve of 0.99 and 0.98; a propensity mean squared error was 0.223. The synthetic and real data were similar in the qualitative methods including t-SNE and histogram analyses. The application of synthetic data in predicting five-year survival in colorectal cancer patients demonstrates comparable performance to models based on real data. This study employs distance to closest records and membership inference test to assess potential privacy exposure, revealing minimal risk. This study demonstrated that it is feasible to synthesize medical data, including time-series data, using the RTSGAN, and the synthetic data can be evaluated to accurately reflect the characteristics of real data through quantitative and qualitative methods as well as by utilizing real-world artificial intelligence models.


Asunto(s)
Neoplasias Colorrectales , Humanos , Redes Neurales de la Computación
14.
Front Artif Intell ; 7: 1336320, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39185366

RESUMEN

Background: The COVID-19 pandemic highlighted the need for accurate virtual sizing in e-commerce to reduce returns and waste. Existing methods for extracting anthropometric data from images have limitations. This study aims to develop a semantic segmentation model trained on synthetic data that can accurately determine body shape from real images, accounting for clothing. Methods: A synthetic dataset of over 22,000 images was created using NVIDIA Omniverse Replicator, featuring human models in various poses, clothing, and environments. Popular CNN architectures (U-Net, SegNet, DeepLabV3, PSPNet) with different backbones were trained on this dataset for semantic segmentation. Models were evaluated on accuracy, precision, recall, and IoU metrics. The best performing model was tested on real human subjects and compared to actual measurements. Results: U-Net with EfficientNet backbone showed the best performance, with 99.83% training accuracy and 0.977 IoU score. When tested on real images, it accurately segmented body shape while accounting for clothing. Comparison with actual measurements on 9 subjects showed average deviations of -0.24 cm for neck, -0.1 cm for shoulder, 1.15 cm for chest, -0.22 cm for thallium, and 0.17 cm for hip measurements. Discussion: The synthetic dataset and trained models enable accurate extraction of anthropometric data from real images while accounting for clothing. This approach has significant potential for improving virtual fitting and reducing returns in e-commerce. Future work will focus on refining the algorithm, particularly for thallium and hip measurements which showed higher variability.

15.
MAGMA ; 2024 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-39207581

RESUMEN

OBJECT: Deep learning has shown great promise for fast reconstruction of accelerated MRI acquisitions by learning from large amounts of raw data. However, raw data is not always available in sufficient quantities. This study investigates synthetic data generation to complement small datasets and improve reconstruction quality. MATERIALS AND METHODS: An adversarial auto-encoder was trained to generate phase and coil sensitivity maps from magnitude images, which were combined into synthetic raw data. On a fourfold accelerated MR reconstruction task, deep-learning-based reconstruction networks were trained with varying amounts of training data (20 to 160 scans). Test set performance was compared between baseline experiments and experiments that incorporated synthetic training data. RESULTS: Training with synthetic raw data showed decreasing reconstruction errors with increasing amounts of training data, but importantly this was magnitude-only data, rather than real raw data. For small training sets, training with synthetic data decreased the mean absolute error (MAE) by up to 7.5%, whereas for larger training sets the MAE increased by up to 2.6%. DISCUSSION: Synthetic raw data generation improved reconstruction quality in scenarios with limited training data. A major advantage of synthetic data generation is that it allows for the reuse of magnitude-only datasets, which are more readily available than raw datasets.

16.
Biomedicines ; 12(8)2024 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-39200323

RESUMEN

(1) Background: Liver metastases (LM) are the leading cause of death in colorectal cancer (CRC) patients. Despite advancements, relapse rates remain high and current prognostic nomograms lack accuracy. Our objective is to develop an interpretable neoadjuvant algorithm based on mathematical models to accurately predict individual risk, ensuring mathematical transparency and auditability. (2) Methods: We retrospectively evaluated 86 CRC patients with LM treated with neoadjuvant systemic therapy followed by complete surgical resection. A comprehensive analysis of 155 individual patient variables was performed. Logistic regression (LR) was utilized to develop the predictive model for relapse risk through significance testing and ANOVA analysis. Due to data limitations, gradient boosting machine (GBM) and synthetic data were also used. (3) Results: The model was based on data from 74 patients (12 were excluded). After a median follow-up of 58 months, 5-year relapse-free survival (RFS) rate was 33% and 5-year overall survival (OS) rate was 60.7%. Fifteen key variables were used to train the GBM model, which showed promising accuracy (0.82), sensitivity (0.59), and specificity (0.96) in predicting relapse. Similar results were obtained when external validation was performed as well. (4) Conclusions: This model offers an alternative for predicting individual relapse risk, aiding in personalized adjuvant therapy and follow-up strategies.

17.
Stud Health Technol Inform ; 316: 929-933, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176944

RESUMEN

Predictive modeling holds a large potential in clinical decision-making, yet its effectiveness can be hindered by inherent data imbalances in clinical datasets. This study investigates the utility of synthetic data for improving the performance of predictive modeling on realistic small imbalanced clinical datasets. We compared various synthetic data generation methods including Generative Adversarial Networks, Normalizing Flows, and Variational Autoencoders to the standard baselines for correcting for class underrepresentation on four clinical datasets. Although results show improvement in F1 scores in some cases, even over multiple repetitions, we do not obtain statistically significant evidence that synthetic data generation outperforms standard baselines for correcting for class imbalance. This study challenges common beliefs about the efficacy of synthetic data for data augmentation and highlights the importance of evaluating new complex methods against simple baselines.


Asunto(s)
Toma de Decisiones Clínicas , Humanos
18.
Stud Health Technol Inform ; 316: 963-967, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176952

RESUMEN

Synthetic tabular health data plays a crucial role in healthcare research, addressing privacy regulations and the scarcity of publicly available datasets. This is essential for diagnostic and treatment advancements. Among the most promising models are transformer-based Large Language Models (LLMs) and Generative Adversarial Networks (GANs). In this paper, we compare LLM models of the Pythia LLM Scaling Suite with varying model sizes ranging from 14M to 1B, against a reference GAN model (CTGAN). The generated synthetic data are used to train random forest estimators for classification tasks to make predictions on the real-world data. Our findings indicate that as the number of parameters increases, LLM models outperform the reference GAN model. Even the smallest 14M parameter models perform comparably to GANs. Moreover, we observe a positive correlation between the size of the training dataset and model performance. We discuss implications, challenges, and considerations for the real-world usage of LLM models for synthetic tabular data generation.


Asunto(s)
Benchmarking , Simulación por Computador
19.
Stud Health Technol Inform ; 316: 1145-1150, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176583

RESUMEN

Advances in general-purpose computers have enabled the generation of high-quality synthetic medical images that human eyes cannot differ between real and AI-generated images. To analyse the efficacy of the generated medical images, this study proposed a modified VGG16-based algorithm to recognise AI-generated medical images. Initially, 10,000 synthetic medical skin lesion images were generated using a Generative Adversarial Network (GAN), providing a set of images for comparison to real images. Then, an enhanced VGG16-based algorithm has been developed to classify real images vs AI-generated images. Following hyperparameters tuning and training, the optimal approach can classify the images with 99.82% accuracy. Multiple other evaluations have been used to evaluate the efficacy of the proposed network. The complete dataset used in this study is available online to the research community for future research.


Asunto(s)
Aprendizaje Profundo , Humanos , Algoritmos , Enfermedades de la Piel/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos , Neoplasias Cutáneas/diagnóstico por imagen
20.
Stud Health Technol Inform ; 316: 1224-1225, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176601

RESUMEN

The identification of vulnerable records (targets) is an important step for many privacy attacks on protected health data. We implemented and evaluated three outlier metrics for detecting potential targets. Next, we assessed differences and similarities between the top-k targets suggested by the different methods and studied how susceptible those targets are to membership inference attacks on synthetic data. Our results suggest that there is no one-size-fits-all approach and that target selection methods should be chosen based on the type of attack that is to be performed.


Asunto(s)
Seguridad Computacional , Confidencialidad , Registros Electrónicos de Salud , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...