RESUMEN
OBJECTIVES: We evaluated methods for preparing electronic health record data to reduce bias before applying artificial intelligence (AI). METHODS: We created methods for transforming raw data into a data framework for applying machine learning and natural language processing techniques for predicting falls and fractures. Strategies such as inclusion and reporting for multiple races, mixed data sources such as outpatient, inpatient, structured codes, and unstructured notes, and addressing missingness were applied to raw data to promote a reduction in bias. The raw data was carefully curated using validated definitions to create data variables such as age, race, gender, and healthcare utilization. For the formation of these variables, clinical, statistical, and data expertise were used. The research team included a variety of experts with diverse professional and demographic backgrounds to include diverse perspectives. RESULTS: For the prediction of falls, information extracted from radiology reports was converted to a matrix for applying machine learning. The processing of the data resulted in an input of 5,377,673 reports to the machine learning algorithm, out of which 45,304 were flagged as positive and 5,332,369 as negative for falls. Processed data resulted in lower missingness and a better representation of race and diagnosis codes. For fractures, specialized algorithms extracted snippets of text around keywork "femoral" from dual x-ray absorptiometry (DXA) scans to identify femoral neck T-scores that are important for predicting fracture risk. The natural language processing algorithms yielded 98% accuracy and 2% error rate The methods to prepare data for input to artificial intelligence processes are reproducible and can be applied to other studies. CONCLUSION: The life cycle of data from raw to analytic form includes data governance, cleaning, management, and analysis. When applying artificial intelligence methods, input data must be prepared optimally to reduce algorithmic bias, as biased output is harmful. Building AI-ready data frameworks that improve efficiency can contribute to transparency and reproducibility. The roadmap for the application of AI involves applying specialized techniques to input data, some of which are suggested here. This study highlights data curation aspects to be considered when preparing data for the application of artificial intelligence to reduce bias.
Asunto(s)
Accidentes por Caídas , Algoritmos , Inteligencia Artificial , Registros Electrónicos de Salud , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Humanos , Accidentes por Caídas/prevención & control , Fracturas Óseas , FemeninoRESUMEN
BACKGROUND AND OBJECTIVES: Artificial intelligence (AI) models trained on multi-centric and multi-device studies can provide more robust insights and research findings compared to single-center studies. However, variability in acquisition protocols and equipment can introduce inconsistencies that hamper the effective pooling of multi-source datasets. This systematic review evaluates strategies for image harmonization, which standardizes appearances to enable reliable AI analysis of multi-source medical imaging. METHODS: A literature search using PRISMA guidelines was conducted to identify relevant papers published between 2013 and 2023 analyzing multi-centric and multi-device medical imaging studies that utilized image harmonization approaches. RESULTS: Common image harmonization techniques included grayscale normalization (improving classification accuracy by up to 24.42 %), resampling (increasing the percentage of robust radiomics features from 59.5 % to 89.25 %), and color normalization (enhancing AUC by up to 0.25 in external test sets). Initially, mathematical and statistical methods dominated, but machine and deep learning adoption has risen recently. Color imaging modalities like digital pathology and dermatology have remained prominent application areas, though harmonization efforts have expanded to diverse fields including radiology, nuclear medicine, and ultrasound imaging. In all the modalities covered by this review, image harmonization improved AI performance, with increasing of up to 24.42 % in classification accuracy and 47 % in segmentation Dice scores. CONCLUSIONS: Continued progress in image harmonization represents a promising strategy for advancing healthcare by enabling large-scale, reliable analysis of integrated multi-source datasets using AI. Standardizing imaging data across clinical settings can help realize personalized, evidence-based care supported by data-driven technologies while mitigating biases associated with specific populations or acquisition protocols.
Asunto(s)
Inteligencia Artificial , Diagnóstico por Imagen , Humanos , Diagnóstico por Imagen/normas , Procesamiento de Imagen Asistido por Computador/métodos , Estudios Multicéntricos como AsuntoRESUMEN
OBJECTIVES: Data-driven decision support tools have been increasingly recognized to transform health care. However, such tools are often developed on predefined research datasets without adequate knowledge of the origin of this data and how it was selected. How a dataset is extracted from a clinical database can profoundly impact the validity, interpretability and interoperability of the dataset, and downstream analyses, yet is rarely reported. Therefore, we present a case study illustrating how a definitive patient list was extracted from a clinical source database and how this can be reported. STUDY DESIGN AND SETTING: A single-center observational study was performed at an academic hospital in the Netherlands to illustrate the impact of selecting a definitive patient list for research from a clinical source database, and the importance of documenting this process. All admissions from the critical care database admitted between January 1, 2013, and January 1, 2023, were used. RESULTS: An interdisciplinary team collaborated to identify and address potential sources of data insufficiency and uncertainty. We demonstrate a stepwise data preparation process, reducing the clinical source database of 54,218 admissions to a definitive patient list of 21,553 admissions. Transparent documentation of the data preparation process improves the quality of the definitive patient list before analysis of the corresponding patient data. This study generated seven important recommendations for preparing observational health-care data for research purposes. CONCLUSION: Documenting data preparation is essential for understanding a research dataset originating from a clinical source database before analyzing health-care data. The findings contribute to establishing data standards and offer insights into the complexities of preparing health-care data for scientific investigation. Meticulous data preparation and documentation thereof will improve research validity and advance critical care.
Asunto(s)
Bases de Datos Factuales , Humanos , Bases de Datos Factuales/normas , Bases de Datos Factuales/estadística & datos numéricos , Países Bajos , Documentación/normas , Documentación/estadística & datos numéricos , Documentación/métodos , Cuidados Críticos/normas , Cuidados Críticos/estadística & datos numéricosRESUMEN
Data collection, curation, and cleaning constitute a crucial phase in Machine Learning (ML) projects. In biomedical ML, it is often desirable to leverage multiple datasets to increase sample size and diversity, but this poses unique challenges, which arise from heterogeneity in study design, data descriptors, file system organization, and metadata. In this study, we present an approach to the integration of multiple brain MRI datasets with a focus on homogenization of their organization and preprocessing for ML. We use our own fusion example (approximately 84,000 images from 54,000 subjects, 12 studies, and 88 individual scanners) to illustrate and discuss the issues faced by study fusion efforts, and we examine key decisions necessary during dataset homogenization, presenting in detail a database structure flexible enough to accommodate multiple observational MRI datasets. We believe our approach can provide a basis for future similarly-minded biomedical ML projects.
RESUMEN
Cosmetics consumers need to be aware of their skin type before purchasing products. Identifying skin types can be challenging, especially when they vary from oily to dry in different areas, with skin specialist providing more accurate results. In recent years, artificial intelligence and machine learning have been utilized across various fields, including medicine, to assist in identifying and predicting situations. This study developed a skin type classification model using a Convolutional Neural Networks (CNN) deep learning algorithms. The dataset consisted of normal, oily, and dry skin images, with 112 images for normal skin, 120 images for oily skin, and 97 images for dry skin. Image quality was enhanced using the Contrast Limited Adaptive Histogram Equalization (CLAHE) technique, with data augmentation by rotation applied to increase dataset variety, resulting in a total of 1,316 images. CNN architectures including MobileNet-V2, EfficientNet-V2, InceptionV2, and ResNet-V1 were optimized and evaluated. Findings showed that the EfficientNet-V2 architecture performed the best, achieving an accuracy of 91.55% with average loss of 22.74%. To further improve the model, hyperparameter tuning was conducted, resulting in an accuracy of 94.57% and a loss of 13.77%. The Model performance was validated using 10-fold cross-validation and tested on unseen data, achieving an accuracy of 89.70% with a loss of 21.68%.
RESUMEN
The pathology is decisive for disease diagnosis but relies heavily on experienced pathologists. In recent years, there has been growing interest in the use of artificial intelligence in pathology (AIP) to enhance diagnostic accuracy and efficiency. However, the impressive performance of deep learning-based AIP in laboratory settings often proves challenging to replicate in clinical practice. As the data preparation is important for AIP, the paper has reviewed AIP-related studies in the PubMed database published from January 2017 to February 2022, and 118 studies were included. An in-depth analysis of data preparation methods is conducted, encompassing the acquisition of pathological tissue slides, data cleaning, screening, and subsequent digitization. Expert review, image annotation, dataset division for model training and validation are also discussed. Furthermore, we delve into the reasons behind the challenges in reproducing the high performance of AIP in clinical settings and present effective strategies to enhance AIP's clinical performance. The robustness of AIP depends on a randomized collection of representative disease slides, incorporating rigorous quality control and screening, correction of digital discrepancies, reasonable annotation, and sufficient data volume. Digital pathology is fundamental in clinical-grade AIP, and the techniques of data standardization and weakly supervised learning methods based on whole slide image (WSI) are effective ways to overcome obstacles of performance reproduction. The key to performance reproducibility lies in having representative data, an adequate amount of labeling, and ensuring consistency across multiple centers. Digital pathology for clinical diagnosis, data standardization and the technique of WSI-based weakly supervised learning will hopefully build clinical-grade AIP.
RESUMEN
In a data-driven context, bionic polarization navigation requires a mass of skylight polarization pattern data with diversity, complete ground truth, and scene information. However, acquiring such data in urban environments, where bionic polarization navigation is widely utilized, remains challenging. In this paper, we proposed a virtual-real-fusion framework of the skylight polarization pattern simulator and provided a data preparation method complementing the existing pure simulation or measurement method. The framework consists of a virtual part simulating the ground truth of skylight polarization pattern, a real part measuring scene information, and a fusion part fusing information of the first two parts according to the imaging projection relationship. To illustrate the framework, we constructed a simulator instance adapted to the urban environment and clear weather and verified it in 174 urban scenes. The results showed that the simulator can provide a mass of diverse urban skylight polarization pattern data with scene information and complete ground truth based on a few practical measurements. Moreover, we released a dataset based on the results and opened our code to facilitate researchers preparing and adapting their datasets to their research targets.
RESUMEN
Coordinate-based meta-analysis (CBMA) is a powerful technique in the field of human brain imaging research. Due to its intense usage, several procedures for data preparation and post hoc analyses have been proposed so far. However, these steps are often performed manually by the researcher, and are therefore potentially prone to error and time-consuming. We hence developed the Coordinate-Based Meta-Analyses Toolbox (CBMAT) to provide a suite of user-friendly and automated MATLAB® functions allowing one to perform all these procedures in a fast, reproducible and reliable way. Besides the description of the code, in the present paper we also provide an annotated example of using CBMAT on a dataset including 34 experiments. CBMAT can therefore substantially improve the way data are handled when performing CBMAs. The code can be downloaded from https://github.com/Jordi-Manuello/CBMAT.git .
RESUMEN
It is well-known that Artificial Intelligence (AI), and in particular Machine Learning (ML), is not effective without good data preparation, as also pointed out by the recent wave of data-centric AI. Data preparation is the process of gathering, transforming and cleaning raw data prior to processing and analysis. Since nowadays data often reside in distributed and heterogeneous data sources, the first activity of data preparation requires collecting data from suitable data sources and data services, often distributed and heterogeneous. It is thus essential that providers describe their data services in a way to make them compliant with the FAIR guiding principles, i.e., make them automatically Findable, Accessible, Interoperable, and Reusable (FAIR). The notion of data abstraction has been introduced exactly to meet this need. Abstraction is a kind of reverse engineering task that automatically provides a semantic characterization of a data service made available by a provider. The goal of this paper is to review the results obtained so far in data abstraction, by presenting the formal framework for its definition, reporting about the decidability and complexity of the main theoretical problems concerning abstraction, and discuss open issues and interesting directions for future research.
RESUMEN
Automated methods for detecting fraudulent healthcare providers have the potential to save billions of dollars in healthcare costs and improve the overall quality of patient care. This study presents a data-centric approach to improve healthcare fraud classification performance and reliability using Medicare claims data. Publicly available data from the Centers for Medicare & Medicaid Services (CMS) are used to construct nine large-scale labeled data sets for supervised learning. First, we leverage CMS data to curate the 2013-2019 Part B, Part D, and Durable Medical Equipment, Prosthetics, Orthotics, and Supplies (DMEPOS) Medicare fraud classification data sets. We provide a review of each data set and data preparation techniques to create Medicare data sets for supervised learning and we propose an improved data labeling process. Next, we enrich the original Medicare fraud data sets with up to 58 new provider summary features. Finally, we address a common model evaluation pitfall and propose an adjusted cross-validation technique that mitigates target leakage to provide reliable evaluation results. Each data set is evaluated on the Medicare fraud classification task using extreme gradient boosting and random forest learners, multiple complementary performance metrics, and 95% confidence intervals. Results show that the new enriched data sets consistently outperform the original Medicare data sets that are currently used in related works. Our results encourage the data-centric machine learning workflow and provide a strong foundation for data understanding and preparation techniques for machine learning applications in healthcare fraud.
RESUMEN
Full spectrum flow cytometry (FSFC) allows for the analysis of more than 40 parameters at the single-cell level. Compared to the practice of manual gating, high-dimensional data analysis can be used to fully explore single-cell datasets and reduce analysis time. As panel size and complexity increases so too does the detail and time required to prepare and validate the quality of the resulting data for use in downstream high-dimensional data analyses. To ensure data analysis algorithms can be used efficiently and to avoid artifacts, some important steps should be considered. These include data cleaning (such as eliminating variable signal change over time, removing cell doublets, and antibody aggregates), proper unmixing of full spectrum data, ensuring correct scale transformation, and correcting for batch effects. We have developed a methodical step-by-step protocol to prepare full spectrum high-dimensional data for use with high-dimensional data analyses, with a focus on visualizing the impact of each step of data preparation using dimensionality reduction algorithms. Application of our workflow will aid FSFC users in their efforts to apply quality control methods to their datasets for use in high-dimensional analysis, and help them to obtain valid and reproducible results. © 2023 Wiley Periodicals LLC. Basic Protocol 1: Data cleaning Basic Protocol 2: Validating the quality of unmixing Basic Protocol 3: Data scaling Basic Protocol 4: Batch-to-batch normalization.
Asunto(s)
Algoritmos , Exactitud de los Datos , Citometría de Flujo/métodos , AnticuerposRESUMEN
Electronic health records (EHRs) have been successfully used in data science and machine learning projects. However, most of these data are collected for clinical use rather than for retrospective analysis. This means that researchers typically face many different issues when attempting to access and prepare the data for secondary use. We aimed to investigate how raw EHRs can be accessed and prepared in retrospective data science projects in a disciplined, effective, and efficient way. We report our experience and findings from a large-scale data science project analyzing routinely acquired retrospective data from the Kepler University Hospital in Linz, Austria. The project involved data collection from more than 150,000 patients over a period of 10 years. It included diverse data modalities, such as static demographic data, irregularly acquired laboratory test results, regularly sampled vital signs, and high-frequency physiological waveform signals. Raw medical data can be corrupted in many unexpected ways that demand thorough manual inspection and highly individualized data cleaning solutions. We present a general data preparation workflow, which was shaped in the course of our project and consists of the following 7 steps: obtain a rough overview of the available EHR data, define clinically meaningful labels for supervised learning, extract relevant data from the hospital's data warehouses, match data extracted from different sources, deidentify them, detect errors and inconsistencies therein through a careful exploratory analysis, and implement a suitable data processing pipeline in actual code. Only few of the data preparation issues encountered in our project were addressed by generic medical data preprocessing tools that have been proposed recently. Instead, highly individualized solutions for the specific data used in one's own research seem inevitable. We believe that the proposed workflow can serve as a guidance for practitioners, helping them to identify and address potential problems early and avoid some common pitfalls.
RESUMEN
PURPOSE: Fiducial markers are commonly used in navigation-assisted minimally invasive spine surgery and they help transfer image coordinates into real-world coordinates. In practice, these markers might be located outside the field-of-view (FOV) of C-arm cone-beam computed tomography (CBCT) systems used in intraoperative surgeries, due to the limited detector sizes. As a consequence, reconstructed markers in CBCT volumes suffer from artifacts and have distorted shapes, which sets an obstacle for navigation. METHODS: In this work, we propose two fiducial marker detection methods: direct detection from distorted markers (direct method) and detection after marker recovery (recovery method). For direct detection from distorted markers in reconstructed volumes, an efficient automatic marker detection method using two neural networks and a conventional circle detection algorithm is proposed. For marker recovery, a task-specific data preparation strategy is proposed to recover markers from severely truncated data. Afterwards, a conventional marker detection algorithm is applied for position detection. The networks in both methods are trained based on simulated data. For the direct method, 6800 images and 10 000 images are generated, respectively, to train the U-Net and ResNet50. For the recovery method, the training set includes 1360 images for FBPConvNet and Pix2pixGAN. The simulated data set with 166 markers and four cadaver cases with real fiducials are used for evaluation. RESULTS: The two methods are evaluated on simulated data and real cadaver data. The direct method achieves 100% detection rates within 1 mm detection error on simulated data with normal truncation and simulated data with heavier noise, but only detect 94.6% markers in extremely severe truncation case. The recovery method detects all the markers successfully in three test data sets and around 95% markers are detected within 0.5 mm error. For real cadaver data, both methods achieve 100% marker detection rates with mean registration error below 0.2 mm. CONCLUSIONS: Our experiments demonstrate that the direct method is capable of detecting distorted markers accurately and the recovery method with the task-specific data preparation strategy has high robustness and generalizability on various data sets. The task-specific data preparation is able to reconstruct structures of interest outside the FOV from severely truncated data better than conventional data preparation.
Asunto(s)
Tomografía Computarizada de Haz Cónico , Marcadores Fiduciales , Algoritmos , Artefactos , Cadáver , Tomografía Computarizada de Haz Cónico/métodos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Fantasmas de ImagenRESUMEN
Objective: To quantify prediction model performance in relation to data preparation choices when using electronic health records (EHR). Study Design and Setting: Cox proportional hazards models were developed for predicting the first-ever main adverse cardiovascular events using Dutch primary care EHR data. The reference model was based on a 1-year run-in period, cardiovascular events were defined based on both EHR diagnosis and medication codes, and missing values were multiply imputed. We compared data preparation choices based on (i) length of the run-in period (2- or 3-year run-in); (ii) outcome definition (EHR diagnosis codes or medication codes only); and (iii) methods addressing missing values (mean imputation or complete case analysis) by making variations on the derivation set and testing their impact in a validation set. Results: We included 89,491 patients in whom 6,736 first-ever main adverse cardiovascular events occurred during a median follow-up of 8 years. Outcome definition based only on diagnosis codes led to a systematic underestimation of risk (calibration curve intercept: 0.84; 95% CI: 0.83-0.84), while complete case analysis led to overestimation (calibration curve intercept: -0.52; 95% CI: -0.53 to -0.51). Differences in the length of the run-in period showed no relevant impact on calibration and discrimination. Conclusion: Data preparation choices regarding outcome definition or methods to address missing values can have a substantial impact on the calibration of predictions, hampering reliable clinical decision support. This study further illustrates the urgency of transparent reporting of modeling choices in an EHR data setting.
RESUMEN
Advances in the manufacturing industry have led to modern approaches such as Industry 4.0, Cyber-Physical Systems, Smart Manufacturing (SM) and Digital Twins. The traditional manufacturing architecture that consisted of hierarchical layers has evolved into a hierarchy-free network in which all the areas of a manufacturing enterprise are interconnected. The field devices on the shop floor generate large amounts of data that can be useful for maintenance planning. Prognostics and Health Management (PHM) approaches use this data and help us in fault detection and Remaining Useful Life (RUL) estimation. Although there is a significant amount of research primarily focused on tool wear prediction and Condition-Based Monitoring (CBM), there is not much importance given to the multiple facets of PHM. This paper conducts a review of PHM approaches, the current research trends and proposes a three-phased interoperable framework to implement Smart Prognostics and Health Management (SPHM). The uniqueness of SPHM lies in its framework, which makes it applicable to any manufacturing operation across the industry. The framework consists of three phases: Phase 1 consists of the shopfloor setup and data acquisition steps, Phase 2 describes steps to prepare and analyze the data and Phase 3 consists of modeling, predictions and deployment. The first two phases of SPHM are addressed in detail and an overview is provided for the third phase, which is a part of ongoing research. As a use-case, the first two phases of the SPHM framework are applied to data from a milling machine operation.
Asunto(s)
Industria ManufactureraRESUMEN
With the rapid increase in sequencing data, human host status inference (e.g. healthy or sick) from microbiome data has become an important issue. Existing studies are mostly based on single-point microbiome composition, while it is rare that the host status is predicted from longitudinal microbiome data. However, single-point-based methods cannot capture the dynamic patterns between the temporal changes and host status. Therefore, it remains challenging to build good predictive models as well as scaling to different microbiome contexts. On the other hand, existing methods are mainly targeted for disease prediction and seldom investigate other host statuses. To fill the gap, we propose a comprehensive deep learning-based framework that utilizes longitudinal microbiome data as input to infer the human host status. Specifically, the framework is composed of specific data preparation strategies and a recurrent neural network tailored for longitudinal microbiome data. In experiments, we evaluated the proposed method on both semi-synthetic and real datasets based on different sequencing technologies and metagenomic contexts. The results indicate that our method achieves robust performance compared to other baseline and state-of-the-art classifiers and provides a significant reduction in prediction time.
Asunto(s)
Biología Computacional/métodos , Interacciones Microbiota-Huesped , Microbiota , Redes Neurales de la Computación , Algoritmos , Análisis de Datos , Aprendizaje Profundo , Humanos , Metagenómica/métodos , ARN Ribosómico 16SRESUMEN
OBJECTIVES: We describe a systematic approach to preparing data in the conduct of Individual Participant Data (IPD) analysis. STUDY DESIGN AND SETTING: A guidance paper proposing methods for preparing individual participant data for meta-analysis from multiple study sources, developed by consultation of relevant guidance and experts in IPD. We present an example of how these steps were applied in checking data for our own IPD meta analysis (IPD-MA). RESULTS: We propose five steps of Processing, Replication, Imputation, Merging, and Evaluation to prepare individual participant data for meta-analysis (PRIME-IPD). Using our own IPD-MA as an exemplar, we found that this approach identified missing variables and potential inconsistencies in the data, facilitated the standardization of indicators across studies, confirmed that the correct data were received from investigators, and resulted in a single, verified dataset for IPD-MA. CONCLUSION: The PRIME-IPD approach can assist researchers to systematically prepare, manage and conduct important quality checks on IPD from multiple studies for meta-analyses. Further testing of this framework in IPD-MA would be useful to refine these steps.
Asunto(s)
Recolección de Datos/estadística & datos numéricos , Recolección de Datos/normas , Guías como Asunto , Registros Médicos/estadística & datos numéricos , Registros Médicos/normas , Estándares de Referencia , Reproducibilidad de los Resultados , Interpretación Estadística de Datos , HumanosRESUMEN
Fluorine-19 MRI shows great promise for a wide range of applications including renal imaging, yet the typically low signal-to-noise ratios and sparse signal distribution necessitate a thorough data preparation.This chapter describes a general data preparation workflow for fluorine MRI experiments. The main processing steps are: (1) estimation of noise level, (2) correction of noise-induced bias and (3) background subtraction. The protocol is supplemented by an example script and toolbox available online.This chapter is based upon work from the COST Action PARENCHIMA, a community-driven network funded by the European Cooperation in Science and Technology (COST) program of the European Union, which aims to improve the reproducibility and standardization of renal MRI biomarkers. This analysis protocol chapter is complemented by two separate chapters describing the basic concept and experimental procedure.
Asunto(s)
Biomarcadores/análisis , Imagen por Resonancia Magnética con Fluor-19/métodos , Aumento de la Imagen/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Riñón/fisiología , Relación Señal-Ruido , Programas Informáticos , Animales , Ratones , Monitoreo Fisiológico , RatasRESUMEN
INTRODUCTION: Conflicting results on dementia risk factors have been reported across studies. We hypothesize that variation in data preparation methods may partially contribute to this issue. METHODS: We propose a comprehensive data preparation approach comparing individuals with stable diagnosis over time to those who progress to mild cognitive impairment (MCI)/dementia. This was compared to the often-used "baseline" analysis. Multivariate logistic regression was used to evaluate both methods. RESULTS: The results obtained from sensitivity analyses were consistent with those from our multi-time-point data preparation approach, exhibiting its robustness. Compared to analysis using only baseline data, the number of significant risk factors identified in progression analyses was substantially lower. Additionally, we found that moderate depression increased healthy-to-MCI/dementia risk, while hypertension reduced MCI-to-dementia risk. DISCUSSION: Overall, multi-time-point-based data preparation approaches may pave the way for a better understanding of dementia risk factors, and address some of the reproducibility issues in the field.
RESUMEN
Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process. We showcase that the details of the data preparation process matter and subtle differences during data conversion can have a large impact on the outcome of runtime results. We introduce a data reproducibility model, identify three levels of data reproducibility, report about our own experience, and exemplify our best practices.