Data cleaning for clinician researchers: Application and explanation of a data-quality framework.

Pilowsky, Julia K; Elliott, Rosalind; Roche, Michael A

Pilowsky, Julia K; Elliott, Rosalind; Roche, Michael A.

Afiliación

Pilowsky JK; Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia; Faculty of Health, University of Technology Sydney, Sydney, NSW, Australia; Royal North Shore Hospital, Northern Sydney Local Health District, Sydney, NSW, Australia. Electronic address: Julia.Pilowsky@sydney.edu.au.
Elliott R; Faculty of Health, University of Technology Sydney, Sydney, NSW, Australia; Royal North Shore Hospital, Northern Sydney Local Health District, Sydney, NSW, Australia; Nursing and Midwifery Directorate, Northern Sydney Local Health District, Sydney, NSW, Australia.
Roche MA; Faculty of Health, University of Technology Sydney, Sydney, NSW, Australia; University of Canberra and ACT Health Directorate, Canberra, ACT, Australia.

Aust Crit Care ; 2024 Apr 09.

Article en En | MEDLINE | ID: mdl-38600009

ABSTRACT

ABSTRACT

BACKGROUND:

Data cleaning is the series of procedures performed before a formal statistical analysis, with the aim of reducing the number of error values in a dataset and improving the overall quality of subsequent analyses. Several study-reporting guidelines recommend the inclusion of data-cleaning procedures; however, little practical guidance exists for how to conduct these procedures.

OBJECTIVES:

This paper aimed to provide practical guidance for how to perform and report rigorous data-cleaning procedures.

METHODS:

A previously proposed data-quality framework was identified and used to facilitate the description and explanation of data-cleaning procedures. The broader data-cleaning process was broken down into discrete tasks to create a data-cleaning checklist. Examples of the how the various tasks had been undertaken for a previous study using data from the Australia and New Zealand Intensive Care Society Adult Patient Database were also provided.

RESULTS:

Data-cleaning tasks were described and grouped according to four data-quality domains described in the framework data integrity, consistency, completeness, and accuracy. Tasks described include creation of a data dictionary, checking consistency of values across multiple variables, quantifying and managing missing data, and the identification and management of outlier values. The data-cleaning task checklist provides a practical summary of the various aspects of the data-cleaning process and will assist clinician researchers in performing this process in the future.

CONCLUSIONS:

Data cleaning is an integral part of any statistical analysis and helps ensure that study results are valid and reproducible. Use of the data-cleaning task checklist will facilitate the conduct of rigorous data-cleaning processes, with the aim of improving the quality of future research.

Palabras clave

Data cleaning; Data management; Initial data analysis; Quantitative methods

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Bases de datos: MEDLINE Idioma: En Revista: Aust Crit Care Asunto de la revista: ENFERMAGEM / TERAPIA INTENSIVA Año: 2024 Tipo del documento: Article