Pesquisa | BVS Violência e Saúde

1.

Towards multi-omics synthetic data integration.

Selvarajoo, Kumar; Maurer-Stroh, Sebastian.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38711370

RESUMO

Across many scientific disciplines, the development of computational models and algorithms for generating artificial or synthetic data is gaining momentum. In biology, there is a great opportunity to explore this further as more and more big data at multi-omics level are generated recently. In this opinion, we discuss the latest trends in biological applications based on process-driven and data-driven aspects. Moving ahead, we believe these methodologies can help shape novel multi-omics-scale cellular inferences.

Assuntos

Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Genômica/métodos , Humanos , Big Data , Proteômica/métodos , Multiômica

2.

Assessing the feasibility of statistical inference using synthetic antibody-antigen datasets.

Minotto, Thomas; Robert, Philippe A; Hobæk Haff, Ingrid; Sandve, Geir K.

Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-38563699

RESUMO

Simulation frameworks are useful to stress-test predictive models when data is scarce, or to assert model sensitivity to specific data distributions. Such frameworks often need to recapitulate several layers of data complexity, including emergent properties that arise implicitly from the interaction between simulation components. Antibody-antigen binding is a complex mechanism by which an antibody sequence wraps itself around an antigen with high affinity. In this study, we use a synthetic simulation framework for antibody-antigen folding and binding on a 3D lattice that include full details on the spatial conformation of both molecules. We investigate how emergent properties arise in this framework, in particular the physical proximity of amino acids, their presence on the binding interface, or the binding status of a sequence, and relate that to the individual and pairwise contributions of amino acids in statistical models for binding prediction. We show that weights learnt from a simple logistic regression model align with some but not all features of amino acids involved in the binding, and that predictive sequence binding patterns can be enriched. In particular, main effects correlated with the capacity of a sequence to bind any antigen, while statistical interactions were related to sequence specificity.

Assuntos

Anticorpos , Antifibrinolíticos , Estudos de Viabilidade , Vacinas Sintéticas , Aminoácidos

3.

A transformer model for cause-specific hazard prediction.

Oliver, Matthieu; Allou, Nicolas; Devineau, Marjolaine; Allyn, Jèrôme; Ferdynus, Cyril.

BMC Bioinformatics ; 25(1): 175, 2024 May 03.

Artigo em Inglês | MEDLINE | ID: mdl-38702609

RESUMO

BACKGROUD: Modelling discrete-time cause-specific hazards in the presence of competing events and non-proportional hazards is a challenging task in many domains. Survival analysis in longitudinal cohorts often requires such models; notably when the data is gathered at discrete points in time and the predicted events display complex dynamics. Current models often rely on strong assumptions of proportional hazards, that is rarely verified in practice; or do not handle sequential data in a meaningful way. This study proposes a Transformer architecture for the prediction of cause-specific hazards in discrete-time competing risks. Contrary to Multilayer perceptrons that were already used for this task (DeepHit), the Transformer architecture is especially suited for handling complex relationships in sequential data, having displayed state-of-the-art performance in numerous tasks with few underlying assumptions on the task at hand. RESULTS: Using synthetic datasets of 2000-50,000 patients, we showed that our Transformer model surpassed the CoxPH, PyDTS, and DeepHit models for the prediction of cause-specific hazard, especially when the proportional assumption did not hold. The error along simulated time outlined the ability of our model to anticipate the evolution of cause-specific hazards at later time steps where few events are observed. It was also superior to current models for prediction of dementia and other psychiatric conditions in the English longitudinal study of ageing cohort using the integrated brier score and the time-dependent concordance index. We also displayed the explainability of our model's prediction using the integrated gradients method. CONCLUSIONS: Our model provided state-of-the-art prediction of cause-specific hazards, without adopting prior parametric assumptions on the hazard rates. It outperformed other models in non-proportional hazards settings for both the synthetic dataset and the longitudinal cohort study. We also observed that basic models such as CoxPH were more suited to extremely simple settings than deep learning models. Our model is therefore especially suited for survival analysis on longitudinal cohorts with complex dynamics of the covariate-to-outcome relationship, which are common in clinical practice. The integrated gradients provided the importance scores of input variables, which indicated variables guiding the model in its prediction. This model is ready to be utilized for time-to-event prediction in longitudinal cohorts.

Assuntos

Modelos de Riscos Proporcionais , Humanos , Análise de Sobrevida

4.

Synthetic Data and its Utility in Pathology and Laboratory Medicine.

Pantanowitz, Joshua; Manko, Christopher D; Pantanowitz, Liron; Rashidi, Hooman H.

Lab Invest ; 104(8): 102095, 2024 Jun 24.

Artigo em Inglês | MEDLINE | ID: mdl-38925488

RESUMO

In our rapidly expanding landscape of artificial intelligence, synthetic data have become a topic of great promise and also some concern. This review aimed to provide pathologists and laboratory professionals with a primer on the role of synthetic data and how it may soon shape the landscape within our field. Using synthetic data presents many advantages but also introduces a milieu of new obstacles and limitations. This review aimed to provide pathologists and laboratory professionals with a primer on the general concept of synthetic data and its potential to transform our field. By leveraging synthetic data, we can help accelerate the development of various machine learning models and enhance our medical education and research/quality study needs. This review explored the methods for generating synthetic data, including rule-based, machine learning model-based and hybrid approaches, as they apply to applications within pathology and laboratory medicine. We also discussed the limitations and challenges associated with such synthetic data, including data quality, malicious use, and ethical bias/concerns and challenges. By understanding the potential benefits (ie, medical education, training artificial intelligence programs, and proficiency testing, etc) and limitations of this new data realm, we can not only harness its power to improve patient outcomes, advance research, and enhance the practice of pathology but also become readily aware of their intrinsic limitations.

5.

SPINNED: Simulation-based physics-informed neural network for deconvolution of dynamic susceptibility contrast MRI perfusion data.

Asaduddin, Muhammad; Kim, Eung Yeop; Park, Sung-Hong.

Magn Reson Med ; 92(3): 1205-1218, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38623911

RESUMO

PURPOSE: To propose the simulation-based physics-informed neural network for deconvolution of dynamic susceptibility contrast (DSC) MRI (SPINNED) as an alternative for more robust and accurate deconvolution compared to existing methods. METHODS: The SPINNED method was developed by generating synthetic tissue residue functions and arterial input functions through mathematical simulations and by using them to create synthetic DSC MRI time series. The SPINNED model was trained using these simulated data to learn the underlying physical relation (deconvolution) between the DSC-MRI time series and the arterial input functions. The accuracy and robustness of the proposed SPINNED method were assessed by comparing it with two common deconvolution methods in DSC MRI data analysis, circulant singular value decomposition, and Volterra singular value decomposition, using both simulation data and real patient data. RESULTS: The proposed SPINNED method was more accurate than the conventional methods across all SNR levels and showed better robustness against noise in both simulation and real patient data. The SPINNED method also showed much faster processing speed than the conventional methods. CONCLUSION: These results support that the proposed SPINNED method can be a good alternative to the existing methods for resolving the deconvolution problem in DSC MRI. The proposed method does not require any separate ground-truth measurement for training and offers additional benefits of quick processing time and coverage of diverse clinical scenarios. Consequently, it will contribute to more reliable, accurate, and rapid diagnoses in clinical applications compared with the previous methods including those based on supervised learning.

Assuntos

Algoritmos , Simulação por Computador , Meios de Contraste , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Humanos , Imageamento por Ressonância Magnética/métodos , Processamento de Imagem Assistida por Computador/métodos , Meios de Contraste/química , Encéfalo/diagnóstico por imagem , Razão Sinal-Ruído

6.

Learning debiased graph representations from the OMOP common data model for synthetic data generation.

Schulz, Nicolas Alexander; Carus, Jasmin; Wiederhold, Alexander Johannes; Johanns, Ole; Peters, Frederik; Rath, Natalie; Rausch, Katharina; Holleczek, Bernd; Katalinic, Alexander; Gundler, Christopher.

BMC Med Res Methodol ; 24(1): 136, 2024 Jun 22.

Artigo em Inglês | MEDLINE | ID: mdl-38909216

RESUMO

BACKGROUND: Generating synthetic patient data is crucial for medical research, but common approaches build up on black-box models which do not allow for expert verification or intervention. We propose a highly available method which enables synthetic data generation from real patient records in a privacy preserving and compliant fashion, is interpretable and allows for expert intervention. METHODS: Our approach ties together two established tools in medical informatics, namely OMOP as a data standard for electronic health records and Synthea as a data synthetization method. For this study, data pipelines were built which extract data from OMOP, convert them into time series format, learn temporal rules by 2 statistical algorithms (Markov chain, TARM) and 3 algorithms of causal discovery (DYNOTEARS, J-PCMCI+, LiNGAM) and map the outputs into Synthea graphs. The graphs are evaluated quantitatively by their individual and relative complexity and qualitatively by medical experts. RESULTS: The algorithms were found to learn qualitatively and quantitatively different graph representations. Whereas the Markov chain results in extremely large graphs, TARM, DYNOTEARS, and J-PCMCI+ were found to reduce the data dimension during learning. The MultiGroupDirect LiNGAM algorithm was found to not be applicable to the problem statement at hand. CONCLUSION: Only TARM and DYNOTEARS are practical algorithms for real-world data in this use case. As causal discovery is a method to debias purely statistical relationships, the gradient-based causal discovery algorithm DYNOTEARS was found to be most suitable.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde , Humanos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Registros Eletrônicos de Saúde/normas , Cadeias de Markov , Informática Médica/métodos , Informática Médica/estatística & dados numéricos

7.

A dynamical model for generating synthetic data to quantify active tactile sensing behavior in the rat.

Zweifel, Nadina O; Bush, Nicholas E; Abraham, Ian; Murphey, Todd D; Hartmann, Mitra J Z.

Proc Natl Acad Sci U S A ; 118(27)2021 07 06.

Artigo em Inglês | MEDLINE | ID: mdl-34210794

RESUMO

As it becomes possible to simulate increasingly complex neural networks, it becomes correspondingly important to model the sensory information that animals actively acquire: the biomechanics of sensory acquisition directly determines the sensory input and therefore neural processing. Here, we exploit the tractable mechanics of the well-studied rodent vibrissal ("whisker") system to present a model that can simulate the signals acquired by a full sensor array actively sampling the environment. Rodents actively "whisk" â¼60 vibrissae (whiskers) to obtain tactile information, and this system is therefore ideal to study closed-loop sensorimotor processing. The simulation framework presented here, WHISKiT Physics, incorporates realistic morphology of the rat whisker array to predict the time-varying mechanical signals generated at each whisker base during sensory acquisition. Single-whisker dynamics were optimized based on experimental data and then validated against free tip oscillations and dynamic responses to collisions. The model is then extrapolated to include all whiskers in the array, incorporating each whisker's individual geometry. Simulation examples in laboratory and natural environments demonstrate that WHISKiT Physics can predict input signals during various behaviors, currently impossible in the biological animal. In one exemplary use of the model, the results suggest that active whisking increases in-plane whisker bending compared to passive stimulation and that principal component analysis can reveal the relative contributions of whisker identity and mechanics at each whisker base to the vibrissotactile response. These results highlight how interactions between array morphology and individual whisker geometry and dynamics shape the signals that the brain must process.

Assuntos

Comportamento Animal/fisiologia , Modelos Neurológicos , Tato/fisiologia , Animais , Estimulação Física , Ratos , Transdução de Sinais , Fatores de Tempo , Vibrissas/fisiologia

8.

A novel generative adversarial networks modelling for the class imbalance problem in high dimensional omics data.

Cusworth, Samuel; Gkoutos, Georgios V; Acharjee, Animesh.

BMC Med Inform Decis Mak ; 24(1): 90, 2024 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-38549123

RESUMO

Class imbalance remains a large problem in high-throughput omics analyses, causing bias towards the over-represented class when training machine learning-based classifiers. Oversampling is a common method used to balance classes, allowing for better generalization of the training data. More naive approaches can introduce other biases into the data, being especially sensitive to inaccuracies in the training data, a problem considering the characteristically noisy data obtained in healthcare. This is especially a problem with high-dimensional data. A generative adversarial network-based method is proposed for creating synthetic samples from small, high-dimensional data, to improve upon other more naive generative approaches. The method was compared with 'synthetic minority over-sampling technique' (SMOTE) and 'random oversampling' (RO). Generative methods were validated by training classifiers on the balanced data.

Assuntos

Aprendizado de Máquina , Viés

9.

Collaborative learning from distributed data with differentially private synthetic data.

Prediger, Lukas; Jälkö, Joonas; Honkela, Antti; Kaski, Samuel.

BMC Med Inform Decis Mak ; 24(1): 167, 2024 Jun 14.

Artigo em Inglês | MEDLINE | ID: mdl-38877563

RESUMO

BACKGROUND: Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible due to privacy concerns and parties are unable to engage in centrally coordinated joint computation. We study the feasibility of combining privacy preserving synthetic data sets in place of the original data for collaborative learning on real-world health data from the UK Biobank. METHODS: We perform an empirical evaluation based on an existing prospective cohort study from the literature. Multiple parties were simulated by splitting the UK Biobank cohort along assessment centers, for which we generate synthetic data using differentially private generative modelling techniques. We then apply the original study's Poisson regression analysis on the combined synthetic data sets and evaluate the effects of 1) the size of local data set, 2) the number of participating parties, and 3) local shifts in distributions, on the obtained likelihood scores. RESULTS: We discover that parties engaging in the collaborative learning via shared synthetic data obtain more accurate estimates of the regression parameters compared to using only their local data. This finding extends to the difficult case of small heterogeneous data sets. Furthermore, the more parties participate, the larger and more consistent the improvements become up to a certain limit. Finally, we find that data sharing can especially help parties whose data contain underrepresented groups to perform better-adjusted analysis for said groups. CONCLUSIONS: Based on our results we conclude that sharing of synthetic data is a viable method for enabling learning from sensitive data without violating privacy constraints even if individual data sets are small or do not represent the overall population well. Lack of access to distributed sensitive data is often a bottleneck in biomedical research, which our study shows can be alleviated with privacy-preserving collaborative learning methods.

Assuntos

Disseminação de Informação , Humanos , Reino Unido , Comportamento Cooperativo , Confidencialidade/normas , Privacidade , Bancos de Espécimes Biológicos , Estudos Prospectivos

10.

Comparative assessment of synthetic time series generation approaches in healthcare: leveraging patient metadata for accurate data synthesis.

Isasa, Imanol; Hernandez, Mikel; Epelde, Gorka; Londoño, Francisco; Beristain, Andoni; Larrea, Xabat; Alberdi, Ane; Bamidis, Panagiotis; Konstantinidis, Evdokimos.

BMC Med Inform Decis Mak ; 24(1): 27, 2024 Jan 30.

Artigo em Inglês | MEDLINE | ID: mdl-38291386

RESUMO

BACKGROUND: Synthetic data is an emerging approach for addressing legal and regulatory concerns in biomedical research that deals with personal and clinical data, whether as a single tool or through its combination with other privacy enhancing technologies. Generating uncompromised synthetic data could significantly benefit external researchers performing secondary analyses by providing unlimited access to information while fulfilling pertinent regulations. However, the original data to be synthesized (e.g., data acquired in Living Labs) may consist of subjects' metadata (static) and a longitudinal component (set of time-dependent measurements), making it challenging to produce coherent synthetic counterparts. METHODS: Three synthetic time series generation approaches were defined and compared in this work: only generating the metadata and coupling it with the real time series from the original data (A1), generating both metadata and time series separately to join them afterwards (A2), and jointly generating both metadata and time series (A3). The comparative assessment of the three approaches was carried out using two different synthetic data generation models: the Wasserstein GAN with Gradient Penalty (WGAN-GP) and the DöppelGANger (DGAN). The experiments were performed with three different healthcare-related longitudinal datasets: Treadmill Maximal Effort Test (TMET) measurements from the University of Malaga (1), a hypotension subset derived from the MIMIC-III v1.4 database (2), and a lifelogging dataset named PMData (3). RESULTS: Three pivotal dimensions were assessed on the generated synthetic data: resemblance to the original data (1), utility (2), and privacy level (3). The optimal approach fluctuates based on the assessed dimension and metric. CONCLUSION: The initial characteristics of the datasets to be synthesized play a crucial role in determining the best approach. Coupling synthetic metadata with real time series (A1), as well as jointly generating synthetic time series and metadata (A3), are both competitive methods, while separately generating time series and metadata (A2) appears to perform more poorly overall.

Assuntos

Metadados , Privacidade , Humanos , Fatores de Tempo , Bases de Dados Factuais

11.

Does synthetic data augmentation improve the performances of machine learning classifiers for identifying health problems in patient-nurse verbal communications in home healthcare settings?

Scroggins, Jihye Kim; Topaz, Maxim; Song, Jiyoun; Zolnoori, Maryam.

J Nurs Scholarsh ; 2024 Jul 03.

Artigo em Inglês | MEDLINE | ID: mdl-38961517

RESUMO

BACKGROUND: Identifying health problems in audio-recorded patient-nurse communication is important to improve outcomes in home healthcare patients who have complex conditions with increased risks of hospital utilization. Training machine learning classifiers for identifying problems requires resource-intensive human annotation. OBJECTIVE: To generate synthetic patient-nurse communication and to automatically annotate for common health problems encountered in home healthcare settings using GPT-4. We also examined whether augmenting real-world patient-nurse communication with synthetic data can improve the performance of machine learning to identify health problems. DESIGN: Secondary data analysis of patient-nurse verbal communication data in home healthcare settings. METHODS: The data were collected from one of the largest home healthcare organizations in the United States. We used 23 audio recordings of patient-nurse communications from 15 patients. The audio recordings were transcribed verbatim and manually annotated for health problems (e.g., circulation, skin, pain) indicated in the Omaha System Classification scheme. Synthetic data of patient-nurse communication were generated using the in-context learning prompting method, enhanced by chain-of-thought prompting to improve the automatic annotation performance. Machine learning classifiers were applied to three training datasets: real-world communication, synthetic communication, and real-world communication augmented by synthetic communication. RESULTS: Average F1 scores improved from 0.62 to 0.63 after training data were augmented with synthetic communication. The largest increase was observed using the XGBoost classifier where F1 scores improved from 0.61 to 0.64 (about 5% improvement). When trained solely on either real-world communication or synthetic communication, the classifiers showed comparable F1 scores of 0.62-0.61, respectively. CONCLUSION: Integrating synthetic data improves machine learning classifiers' ability to identify health problems in home healthcare, with performance comparable to training on real-world data alone, highlighting the potential of synthetic data in healthcare analytics. CLINICAL RELEVANCE: This study demonstrates the clinical relevance of leveraging synthetic patient-nurse communication data to enhance machine learning classifier performances to identify health problems in home healthcare settings, which will contribute to more accurate and efficient problem identification and detection of home healthcare patients with complex health conditions.

12.

Synthetic Training Data in AI-Driven Quality Inspection: The Significance of Camera, Lighting, and Noise Parameters.

Schraml, Dominik; Notni, Gunther.

Sensors (Basel) ; 24(2)2024 Jan 19.

Artigo em Inglês | MEDLINE | ID: mdl-38276341

RESUMO

Industrial-quality inspections, particularly those leveraging AI, require significant amounts of training data. In fields like injection molding, producing a multitude of defective parts for such data poses environmental and financial challenges. Synthetic training data emerge as a potential solution to address these concerns. Although the creation of realistic synthetic 2D images from 3D models of injection-molded parts involves numerous rendering parameters, the current literature on the generation and application of synthetic data in industrial-quality inspection scarcely addresses the impact of these parameters on AI efficacy. In this study, we delve into some of these key parameters, such as camera position, lighting, and computational noise, to gauge their effect on AI performance. By utilizing Blender software, we procedurally introduced the "flash" defect on a 3D model sourced from a CAD file of an injection-molded part. Subsequently, with Blender's Cycles rendering engine, we produced datasets for each parameter variation. These datasets were then used to train a pre-trained EfficientNet-V2 for the binary classification of the "flash" defect. Our results indicate that while noise is less critical, using a range of noise levels in training can benefit model adaptability and efficiency. Variability in camera positioning and lighting conditions was found to be more significant, enhancing model performance even when real-world conditions mirror the controlled synthetic environment. These findings suggest that incorporating diverse lighting and camera dynamics is beneficial for AI applications, regardless of the consistency in real-world operational settings.

13.

Latent Space Representations for Marker-Less Realtime Hand-Eye Calibration.

Martínez-Franco, Juan Camilo; Rojas-Álvarez, Ariel; Tabares, Alejandra; Álvarez-Martínez, David; Marín-Moreno, César Augusto.

Sensors (Basel) ; 24(14)2024 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-39066062

RESUMO

Marker-less hand-eye calibration permits the acquisition of an accurate transformation between an optical sensor and a robot in unstructured environments. Single monocular cameras, despite their low cost and modest computation requirements, present difficulties for this purpose due to their incomplete correspondence of projected coordinates. In this work, we introduce a hand-eye calibration procedure based on the rotation representations inferred by an augmented autoencoder neural network. Learning-based models that attempt to directly regress the spatial transform of objects such as the links of robotic manipulators perform poorly in the orientation domain, but this can be overcome through the analysis of the latent space vectors constructed in the autoencoding process. This technique is computationally inexpensive and can be run in real time in markedly varied lighting and occlusion conditions. To evaluate the procedure, we use a color-depth camera and perform a registration step between the predicted and the captured point clouds to measure translation and orientation errors and compare the results to a baseline based on traditional checkerboard markers.

14.

Bridging Formal Shape Models and Deep Learning: A Novel Fusion for Understanding 3D Objects.

Zhang, Jincheng; Willis, Andrew R.

Sensors (Basel) ; 24(12)2024 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-38931658

RESUMO

This article describes a novel fusion of a generative formal model for three-dimensional (3D) shapes with deep learning (DL) methods to understand the geometric structure of 3D objects and the relationships between their components, given a collection of unorganized point cloud measurements. Formal 3D shape models are implemented as shape grammar programs written in Procedural Shape Modeling Language (PSML). Users write PSML programs to describe complex objects, and DL networks estimate the configured free parameters of the program to generate 3D shapes. Users write PSML programs to enforce fundamental rules that define an object class and encode object attributes, including shapes, components, size, position, etc., into a parametric representation of objects. This fusion of the generative model with DL offers artificial intelligence (AI) models an opportunity to better understand the geometric organization of objects in terms of their components and their relationships to other objects. This approach allows human-in-the-loop control over DL estimates by specifying lists of candidate objects, the shape variations that each object can exhibit, and the level of detail or, equivalently, dimension of the latent representation of the shape. The results demonstrate the advantages of the proposed method over competing approaches.

15.

Automated Porosity Characterization for Aluminum Die Casting Materials Using X-ray Radiography, Synthetic X-ray Data Augmentation by Simulation, and Machine Learning.

Bosse, Stefan; Lehmhus, Dirk; Kumar, Sanjeev.

Sensors (Basel) ; 24(9)2024 May 05.

Artigo em Inglês | MEDLINE | ID: mdl-38733042

RESUMO

Detection and characterization of hidden defects, impurities, and damages in homogeneous materials like aluminum die casting materials, as well as composite materials like Fiber-Metal Laminates (FML), is still a challenge. This work discusses methods and challenges in data-driven modeling of automated damage and defect detectors using measured X-ray single- and multi-projection images. Three main issues are identified: Data and feature variance, data feature labeling (for supervised machine learning), and the missing ground truth. It will be shown that simulation of synthetic measuring data can deliver a ground truth dataset and accurate labeling for data-driven modeling, but it cannot be used directly to predict defects in manufacturing processes. Noise has a significant impact on the feature detection and will be discussed. Data-driven feature detectors are implemented with semantic pixel Convolutional Neural Networks. Experimental data are measured with different devices: A low-quality and low-cost (Low-Q) X-ray radiography, a typical industrial mid-quality X-ray radiography and Computed Tomography (CT) system, and a state-of-the-art high-quality µ-CT device. The goals of this work are the training of robust and generalized data-driven ML feature detectors with synthetic data only and the transition from CT to single-projection radiography imaging and analysis. Although, as the title implies, the primary task is pore characterization in aluminum high-pressure die-cast materials, but the methods and results are not limited to this use case.

16.

PVS-GEN: Systematic Approach for Universal Synthetic Data Generation Involving Parameterization, Verification, and Segmentation.

Kim, Kyung-Min; Kwak, Jong Wook.

Sensors (Basel) ; 24(1)2024 Jan 02.

Artigo em Inglês | MEDLINE | ID: mdl-38203126

RESUMO

Synthetic data generation addresses the challenges of obtaining extensive empirical datasets, offering benefits such as cost-effectiveness, time efficiency, and robust model development. Nonetheless, synthetic data-generation methodologies still encounter significant difficulties, including a lack of standardized metrics for modeling different data types and comparing generated results. This study introduces PVS-GEN, an automated, general-purpose process for synthetic data generation and verification. The PVS-GEN method parameterizes time-series data with minimal human intervention and verifies model construction using a specific metric derived from extracted parameters. For complex data, the process iteratively segments the empirical dataset until an extracted parameter can reproduce synthetic data that reflects the empirical characteristics, irrespective of the sensor data type. Moreover, we introduce the PoR metric to quantify the quality of the generated data by evaluating its time-series characteristics. Consequently, the proposed method can automatically generate diverse time-series data that covers a wide range of sensor types. We compared PVS-GEN with existing synthetic data-generation methodologies, and PVS-GEN demonstrated a superior performance. It generated data with a similarity of up to 37.1% across multiple data types and by 19.6% on average using the proposed metric, irrespective of the data type.

17.

Performance Assessment of Object Detection Models Trained with Synthetic Data: A Case Study on Electrical Equipment Detection.

Santos, David O; Montalvão, Jugurta; Araujo, Charles A C; Lebre, Ulisses D E S; Ferreira, Tarso V; Freire, Eduardo O.

Sensors (Basel) ; 24(13)2024 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-39000997

RESUMO

This paper explores a data augmentation approach for images of rigid bodies, particularly focusing on electrical equipment and analogous industrial objects. By leveraging manufacturer-provided datasheets containing precise equipment dimensions, we employed straightforward algorithms to generate synthetic images, permitting the expansion of the training dataset from a potentially unlimited viewpoint. In scenarios lacking genuine target images, we conducted a case study using two well-known detectors, representing two machine-learning paradigms: the Viola-Jones (VJ) and You Only Look Once (YOLO) detectors, trained exclusively on datasets featuring synthetic images as the positive examples of the target equipment, namely lightning rods and potential transformers. Performances of both detectors were assessed using real images in both visible and infrared spectra. YOLO consistently demonstrates F1 scores below 26% in both spectra, while VJ's scores lie in the interval from 38% to 61%. This performance discrepancy is discussed in view of paradigms' strengths and weaknesses, whereas the relatively high scores of at least one detector are taken as empirical evidence in favor of the proposed data augmentation approach.

18.

Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection.

Lange, Lucas; Wenzlitschke, Nils; Rahm, Erhard.

Sensors (Basel) ; 24(10)2024 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-38793906

RESUMO

Smartwatch health sensor data are increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprise sensitive personal information and are resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress, employing Generative Adversarial Networks (GANs) and Differential Privacy (DP) safeguards. Our method not only protects patient information but also enhances data availability for research. To ensure its usefulness, we test synthetic data from multiple GANs and employ different data enhancement strategies on an actual stress detection task. Our GAN-based augmentation methods demonstrate significant improvements in model performance, with private DP training scenarios observing an 11.90-15.48% increase in F1-score, while non-private training scenarios still see a 0.45% boost. These results underline the potential of differentially private synthetic data in optimizing utility-privacy trade-offs, especially with the limited availability of real training samples. Through rigorous quality assessments, we confirm the integrity and plausibility of our synthetic data, which, however, are significantly impacted when increasing privacy requirements.

Assuntos

Privacidade , Dispositivos Eletrônicos Vestíveis , Humanos , Monitorização Fisiológica/métodos , Monitorização Fisiológica/instrumentação , Algoritmos

19.

[Re-identification potential of structured health data]. / Das Reidentifikationspotenzial von strukturierten Gesundheitsdaten.

Drechsler, Jörg; Pauly, Hannah.

Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz ; 67(2): 164-170, 2024 Feb.

Artigo em Alemão | MEDLINE | ID: mdl-38231225

RESUMO

Broad access to health data offers great potential for science and research. However, health data often contains sensitive information that must be protected in a special way. In this context, the article deals with the re-identification potential of health data. After defining the relevant terms, we discuss factors that influence the re-identification potential. We summarize international privacy standards for health data and highlight the importance of background knowledge. Given that the reidentification potential is often underestimated in practice, we present strategies for mitigation based on the Five Safes concept. We also discuss classical data protection strategies as well as methods for generating synthetic health data. The article concludes with a brief discussion and outlook on the planned Health Data Lab at the Federal Institute for Drugs and Medical Devices.

Assuntos

Segurança Computacional , Privacidade , Alemanha , Confidencialidade

20.

Sensor-based characterization of construction and demolition waste at high occupancy densities using synthetic training data and deep learning.

Kronenwett, Felix; Maier, Georg; Leiss, Norbert; Gruna, Robin; Thome, Volker; Längle, Thomas.

Waste Manag Res ; : 734242X241231410, 2024 Feb 22.

Artigo em Inglês | MEDLINE | ID: mdl-38385439

RESUMO

Sensor-based monitoring of construction and demolition waste (CDW) streams plays an important role in recycling (RC). Extracted knowledge about the composition of a material stream helps identifying RC paths, optimizing processing plants and form the basis for sorting. To enable economical use, it is necessary to ensure robust detection of individual objects even with high material throughput. Conventional algorithms struggle with resulting high occupancy densities and object overlap, making deep learning object detection methods more promising. In this study, different deep learning architectures for object detection (Region-based CNN/Region-based Convolutional Neural Network (Faster R-CNN), You only look once (YOLOv3), Single Shot MultiBox Detector (SSD)) are investigated with respect to their suitability for CDW characterization. A mixture of brick and sand-lime brick is considered as an exemplary waste stream. Particular attention is paid to detection performance with increasing occupancy density and particle overlap. A method for the generation of synthetic training images is presented, which avoids time-consuming manual labelling. By testing the models trained on synthetic data on real images, the success of the method is demonstrated. Requirements for synthetic training data composition, potential improvements and simplifications of different architecture approaches are discussed based on the characteristic of the detection task. In addition, the required inference time of the presented models is investigated to ensure their suitability for use under real-time conditions.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA