Your browser doesn't support javascript.
loading
Identifying and handling data bias within primary healthcare data using synthetic data generators.
Draghi, Barbara; Wang, Zhenchen; Myles, Puja; Tucker, Allan.
Affiliation
  • Draghi B; Medicines and Healthcare products Regulatory Agency, London, UK.
  • Wang Z; Brunel University London, London, UK.
  • Myles P; Medicines and Healthcare products Regulatory Agency, London, UK.
  • Tucker A; Medicines and Healthcare products Regulatory Agency, London, UK.
Heliyon ; 10(2): e24164, 2024 Jan 30.
Article in En | MEDLINE | ID: mdl-38288010
ABSTRACT
Advanced synthetic data generators can simulate data samples that closely resemble sensitive personal datasets while significantly reducing the risk of individual identification. The use of these advanced generators holds enormous potential in the medical field, as it allows for the simulation and sharing of sensitive patient data. This enables the development and rigorous validation of novel AI technologies for accurate diagnosis and efficient disease management. Despite the availability of massive ground truth datasets (such as UK-NHS databases that contain millions of patient records), the risk of biases being carried over to data generators still exists. These biases may arise from the under-representation of specific patient cohorts due to cultural sensitivities within certain communities or standardised data collection procedures. Machine learning models can exhibit bias in various forms, including the under-representation of certain groups in the data. This can lead to missing data and inaccurate correlations and distributions, which may also be reflected in synthetic data. Our paper aims to improve synthetic data generators by introducing probabilistic approaches to first detect difficult-to-predict data samples in ground truth data and then boost them when applying the generator. In addition, we explore strategies to generate synthetic data that can reduce bias and, at the same time, improve the performance of predictive models.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies Language: En Journal: Heliyon Year: 2024 Document type: Article Affiliation country: Reino Unido

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies Language: En Journal: Heliyon Year: 2024 Document type: Article Affiliation country: Reino Unido