Evaluating the harmonisation potential of diverse cohort datasets.

Bauermeister, Sarah; Phatak, Mukta; Sparks, Kelly; Sargent, Lana; Griswold, Michael; McHugh, Caitlin; Nalls, Mike; Young, Simon; Bauermeister, Joshua; Elliott, Paul; Steptoe, Andrew; Porteous, David; Dufouil, Carole; Gallacher, John

Bauermeister, Sarah; Phatak, Mukta; Sparks, Kelly; Sargent, Lana; Griswold, Michael; McHugh, Caitlin; Nalls, Mike; Young, Simon; Bauermeister, Joshua; Elliott, Paul; Steptoe, Andrew; Porteous, David; Dufouil, Carole; Gallacher, John.

Afiliación

Bauermeister S; Dementias Platform UK, Oxford, UK. sarah.bauermeister@psych.ox.ac.uk.
Phatak M; Alzheimer Disease Data Initiative, Kirkland, Washington, USA.
Sparks K; Evaluserve, Bengaluru, India.
Sargent L; National Institute of Aging, Bethesda, USA.
Griswold M; University of Mississippi, Oxford, USA.
McHugh C; Alzheimer Disease Data Initiative, Kirkland, Washington, USA.
Nalls M; Data Tecnica International LLC, Washington, USA.
Young S; Dementias Platform UK, Oxford, UK.
Bauermeister J; Dementias Platform UK, Oxford, UK.
Elliott P; Imperial College, London, England.
Steptoe A; University College London, London, England.
Porteous D; University of Edinburgh, Edinburgh, Scotland.
Dufouil C; University of Bordeaux, Bordeaux, France.
Gallacher J; Dementias Platform UK, Oxford, UK.

Eur J Epidemiol ; 38(6): 605-615, 2023 Jun.

Article en En | MEDLINE | ID: mdl-37099244

ABSTRACT

ABSTRACT

Data discovery, the ability to find datasets relevant to an analysis, increases scientific opportunity, improves rigour and accelerates activity. Rapid growth in the depth, breadth, quantity and availability of data provides unprecedented opportunities and challenges for data discovery. A potential tool for increasing the efficiency of data discovery, particularly across multiple datasets is data harmonisation.A set of 124 variables, identified as being of broad interest to neurodegeneration, were harmonised using the C-Surv data model. Harmonisation strategies used were simple calibration, algorithmic transformation and standardisation to the Z-distribution. Widely used data conventions, optimised for inclusiveness rather than aetiological precision, were used as harmonisation rules. The harmonisation scheme was applied to data from four diverse population cohorts.Of the 120 variables that were found in the datasets, correspondence between the harmonised data schema and cohort-specific data models was complete or close for 111 (93%). For the remainder, harmonisation was possible with a marginal a loss of granularity.Although harmonisation is not an exact science, sufficient comparability across datasets was achieved to enable data discovery with relatively little loss of informativeness. This provides a basis for further work extending harmonisation to a larger variable list, applying the harmonisation to further datasets, and incentivising the development of data discovery tools.

Asunto(s)

Conjuntos de Datos como Asunto; Descubrimiento del Conocimiento; Humanos; Estándares de Referencia

Palabras clave

C-surv data model; Cohort; Data discovery; Data harmonisation; Data visualisation; Datasets

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Descubrimiento del Conocimiento / Conjuntos de Datos como Asunto Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Eur J Epidemiol Asunto de la revista: EPIDEMIOLOGIA Año: 2023 Tipo del documento: Article País de afiliación: Reino Unido

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google