Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 139
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Am J Epidemiol ; 193(1): 214-226, 2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-37667811

RESUMEN

Postnatal mental health is often assessed using self-assessment questionnaires in epidemiologic research. Differences in response style, influenced by language, culture, and experience, may mean that the same response may not have the same meaning in different settings. These differences need to be identified and accounted for in cross-cultural comparisons. Here we describe the development and application of anchoring vignettes to investigate the cross-cultural functioning of the Edinburgh Postnatal Depression Scale (EPDS) in urban community samples in India (n = 549) and the United Kingdom (n = 828), alongside a UK calibration sample (n = 226). Participants completed the EPDS and anchoring vignettes when their children were 12-24 months old. In an unadjusted item-response theory model, UK mothers reported higher depressive symptoms than Indian mothers (d = 0.48, 95% confidence interval: 0.358, 0.599). Following adjustment for differences in response style, these positions were reversed (d = -0.25, 95% confidence interval: -0.391, -0.103). Response styles vary between India and the United Kingdom, indicating a need to take these differences into account when making cross-cultural comparisons. Anchoring vignettes offer a valid and feasible method for global data harmonization.


Asunto(s)
Depresión Posparto , Femenino , Niño , Humanos , Lactante , Preescolar , Depresión Posparto/diagnóstico , Depresión Posparto/psicología , Madres/psicología , Reino Unido , Encuestas y Cuestionarios , Salud Mental , Escalas de Valoración Psiquiátrica
2.
Pediatr Blood Cancer ; 71(2): e30745, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37889049

RESUMEN

In March 2023, over 800 researchers, clinicians, patients, survivors, and advocates from the pediatric oncology community met to discuss the progress of the National Cancer Institute's Childhood Cancer Data Initiative. We present here the status of the initiative's efforts in building its data ecosystem and updates on key programs, especially the Molecular Characterization Initiative and the planned Coordinated National Initiative for Rare Cancers in Children and Young Adults. These activities aim to improve access to childhood cancer data, foster collaborations, facilitate integrative data analysis, and expand access to molecular characterization, ultimately leading to the development of innovative therapeutic approaches.


Asunto(s)
Neoplasias , Humanos , Niño , Neoplasias/terapia , Ecosistema , Oncología Médica
3.
Environ Sci Technol ; 58(27): 12260-12271, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38923944

RESUMEN

Despite the critical importance of virus disinfection by chlorine, our fundamental understanding of the relative susceptibility of different viruses to chlorine and robust quantitative relationships between virus disinfection rate constants and environmental parameters remains limited. We conducted a systematic review of virus inactivation by free chlorine and used the resulting data set to develop a linear mixed model that estimates chlorine inactivation rate constants for viruses based on experimental conditions. 570 data points were collected in our systematic review, representing 82 viruses over a broad range of environmental conditions. The harmonized inactivation rate constants under reference conditions (pH = 7.53, T = 20 °C, [Cl-] < 50 mM) spanned 5 orders of magnitude, ranging from 0.0196 to 1150 L mg-1 min-1, and uncovered important trends between viruses. Whereas common surrogate bacteriophage MS2 does not serve as a conservative chlorine disinfection surrogate for many human viruses, CVB5 was one of the most resistant viruses in the data set. The model quantifies the role of pH, temperature, and chloride levels across viruses, and an online tool allows users to estimate rate constants for viruses and conditions of interest. Results from the model identified potential shortcomings in current U.S. EPA drinking water disinfection requirements.


Asunto(s)
Cloro , Desinfección , Cloro/farmacología , Inactivación de Virus/efectos de los fármacos , Virus/efectos de los fármacos , Desinfectantes/farmacología
4.
Eur J Epidemiol ; 39(7): 773-783, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38805076

RESUMEN

While its etiology is not fully elucidated, preterm birth represents a major public health concern as it is the leading cause of child mortality and morbidity. Stress is one of the most common perinatal conditions and may increase the risk of preterm birth. In this paper we aimed to investigate the association of maternal perceived stress and anxiety with length of gestation. We used harmonized data from five birth cohorts from Canada, France, and Norway. A total of 5297 pregnancies of singletons were included in the analysis of perceived stress and gestational duration, and 55,775 pregnancies for anxiety. Federated analyses were performed through the DataSHIELD platform using Cox regression models within intervals of gestational age. The models were fit for each cohort separately, and the cohort-specific results were combined using random effects study-level meta-analysis. Moderate and high levels of perceived stress during pregnancy were associated with a shorter length of gestation in the very/moderately preterm interval [moderate: hazard ratio (HR) 1.92 (95%CI 0.83, 4.48); high: 2.04 (95%CI 0.77, 5.37)], albeit not statistically significant. No association was found for the other intervals. Anxiety was associated with gestational duration in the very/moderately preterm interval [1.66 (95%CI 1.32, 2.08)], and in the early term interval [1.15 (95%CI 1.08, 1.23)]. Our findings suggest that perceived stress and anxiety are associated with an increased risk of earlier birth, but only in the earliest gestational ages. We also found an association in the early term period for anxiety, but the result was only driven by the largest cohort, which collected information the latest in pregnancy. This raised a potential issue of reverse causality as anxiety later in pregnancy could be due to concerns about early signs of a possible preterm birth.


Asunto(s)
Ansiedad , Edad Gestacional , Nacimiento Prematuro , Estrés Psicológico , Humanos , Femenino , Embarazo , Estrés Psicológico/epidemiología , Ansiedad/epidemiología , Canadá/epidemiología , Adulto , Nacimiento Prematuro/epidemiología , Nacimiento Prematuro/psicología , Cohorte de Nacimiento , Complicaciones del Embarazo/epidemiología , Complicaciones del Embarazo/psicología , Estudios de Cohortes , Factores de Riesgo , Recién Nacido , Modelos de Riesgos Proporcionales , Noruega/epidemiología
5.
J Biomed Inform ; 155: 104661, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38806105

RESUMEN

BACKGROUND: Establishing collaborations between cohort studies has been fundamental for progress in health research. However, such collaborations are hampered by heterogeneous data representations across cohorts and legal constraints to data sharing. The first arises from a lack of consensus in standards of data collection and representation across cohort studies and is usually tackled by applying data harmonization processes. The second is increasingly important due to raised awareness for privacy protection and stricter regulations, such as the GDPR. Federated learning has emerged as a privacy-preserving alternative to transferring data between institutions through analyzing data in a decentralized manner. METHODS: In this study, we set up a federated learning infrastructure for a consortium of nine Dutch cohorts with appropriate data available to the etiology of dementia, including an extract, transform, and load (ETL) pipeline for data harmonization. Additionally, we assessed the challenges of transforming and standardizing cohort data using the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) and evaluated our tool in one of the cohorts employing federated algorithms. RESULTS: We successfully applied our ETL tool and observed a complete coverage of the cohorts' data by the OMOP CDM. The OMOP CDM facilitated the data representation and standardization, but we identified limitations for cohort-specific data fields and in the scope of the vocabularies available. Specific challenges arise in a multi-cohort federated collaboration due to technical constraints in local environments, data heterogeneity, and lack of direct access to the data. CONCLUSION: In this article, we describe the solutions to these challenges and limitations encountered in our study. Our study shows the potential of federated learning as a privacy-preserving solution for multi-cohort studies that enhance reproducibility and reuse of both data and analyses.


Asunto(s)
Demencia , Humanos , Países Bajos , Estudios de Cohortes , Algoritmos , Difusión de la Información/métodos , Investigación Biomédica
6.
BMC Med Inform Decis Mak ; 24(1): 58, 2024 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-38408983

RESUMEN

BACKGROUND: To gain insight into the real-life care of patients in the healthcare system, data from hospital information systems and insurance systems are required. Consequently, linking clinical data with claims data is necessary. To ensure their syntactic and semantic interoperability, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) from the Observational Health Data Sciences and Informatics (OHDSI) community was chosen. However, there is no detailed guide that would allow researchers to follow a generic process for data harmonization, i.e. the transformation of local source data into the standardized OMOP CDM format. Thus, the aim of this paper is to conceptualize a generic data harmonization process for OMOP CDM. METHODS: For this purpose, we conducted a literature review focusing on publications that address the harmonization of clinical or claims data in OMOP CDM. Subsequently, the process steps used and their chronological order as well as applied OHDSI tools were extracted for each included publication. The results were then compared to derive a generic sequence of the process steps. RESULTS: From 23 publications included, a generic data harmonization process for OMOP CDM was conceptualized, consisting of nine process steps: dataset specification, data profiling, vocabulary identification, coverage analysis of vocabularies, semantic mapping, structural mapping, extract-transform-load-process, qualitative and quantitative data quality analysis. Furthermore, we identified seven OHDSI tools which supported five of the process steps. CONCLUSIONS: The generic data harmonization process can be used as a step-by-step guide to assist other researchers in harmonizing source data in OMOP CDM.


Asunto(s)
Informática Médica , Vocabulario , Humanos , Bases de Datos Factuales , Ciencia de los Datos , Semántica , Registros Electrónicos de Salud
7.
Am J Epidemiol ; 192(12): 2033-2049, 2023 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-37403415

RESUMEN

The Preconception Period Analysis of Risks and Exposures Influencing Health and Development (PrePARED) Consortium creates a novel resource for addressing preconception health by merging data from numerous cohort studies. In this paper, we describe our data harmonization methods and results. Individual-level data from 12 prospective studies were pooled. The crosswalk-cataloging-harmonization procedure was used. The index pregnancy was defined as the first postbaseline pregnancy lasting more than 20 weeks. We assessed heterogeneity across studies by comparing preconception characteristics in different types of studies. The pooled data set included 114,762 women, and 25,531 (22%) reported at least 1 pregnancy of more than 20 weeks' gestation during the study period. Babies from the index pregnancies were delivered between 1976 and 2021 (median, 2008), at a mean maternal age of 29.7 (standard deviation, 4.6) years. Before the index pregnancy, 60% of women were nulligravid, 58% had a college degree or more, and 37% were overweight or obese. Other harmonized variables included race/ethnicity, household income, substance use, chronic conditions, and perinatal outcomes. Participants from pregnancy-planning studies had more education and were healthier. The prevalence of preexisting medical conditions did not vary substantially based on whether studies relied on self-reported data. Use of harmonized data presents opportunities to study uncommon preconception risk factors and pregnancy-related events. This harmonization effort laid the groundwork for future analyses and additional data harmonization.


Asunto(s)
Estado de Salud , Embarazo , Humanos , Femenino , Preescolar , Estudios Prospectivos , Factores de Riesgo
8.
Brief Bioinform ; 22(2): 664-675, 2021 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-33348368

RESUMEN

With the outbreak of the COVID-19 disease, the research community is producing unprecedented efforts dedicated to better understand and mitigate the effects of the pandemic. In this context, we review the data integration efforts required for accessing and searching genome sequences and metadata of SARS-CoV2, the virus responsible for the COVID-19 disease, which have been deposited into the most important repositories of viral sequences. Organizations that were already present in the virus domain are now dedicating special interest to the emergence of COVID-19 pandemics, by emphasizing specific SARS-CoV2 data and services. At the same time, novel organizations and resources were born in this critical period to serve specifically the purposes of COVID-19 mitigation while setting the research ground for contrasting possible future pandemics. Accessibility and integration of viral sequence data, possibly in conjunction with the human host genotype and clinical data, are paramount to better understand the COVID-19 disease and mitigate its effects. Few examples of host-pathogen integrated datasets exist so far, but we expect them to grow together with the knowledge of COVID-19 disease; once such datasets will be available, useful integrative surveillance mechanisms can be put in place by observing how common variants distribute in time and space, relating them to the phenotypic impact evidenced in the literature.


Asunto(s)
COVID-19/terapia , COVID-19/epidemiología , COVID-19/virología , Genes Virales , Humanos , Almacenamiento y Recuperación de la Información , Pandemias , SARS-CoV-2/genética , SARS-CoV-2/aislamiento & purificación
9.
BMC Med Res Methodol ; 23(1): 240, 2023 10 18.
Artículo en Inglés | MEDLINE | ID: mdl-37853326

RESUMEN

BACKGROUND: Data harmonisation is essential in real-world data (RWD) research projects based on hospital information systems databases, as coding systems differ between countries. The Hungarian hospital information systems and the national claims database use internationally known diagnosis codes, but data on medical procedures are recorded using national codes. There is no simple or standard solution for mapping the national codes to a standard coding system. Our aim was to map the Hungarian procedure codes (OENO) to SNOMED CT as part of the European Health Data Evidence Network (EHDEN) project. METHODS: We recruited 25 professionals from different specialties to manually map the procedure codes used between 2011 and 2021. A mapping protocol and training material were developed, results were regularly revised, and the challenges of mapping were recorded. Approximately 7% of the codes were mapped by more people in different specialties for validation purposes. RESULTS: We mapped 4661 OENO codes to standard vocabularies, mostly SNOMED CT. We categorized the challenges into three main areas: semantic, matching, and methodological. Semantic refers to the occasionally unclear meaning of the OENO codes, matching to the different granularity and purpose of the OENO and SNOMED CT vocabularies. Lastly, methodological challenges were used to describe issues related to the design of the above-mentioned two vocabularies. CONCLUSIONS: The challenges and solutions presented here may help other researchers to design their process to map their national codes to standard vocabularies in order to achieve greater consistency in mapping results. Moreover, we believe that our work will allow for better use of RWD collected in Hungary in international research collaborations.


Asunto(s)
Sistemas de Registros Médicos Computarizados , Systematized Nomenclature of Medicine , Humanos , Hungría , Registros , Bases de Datos Factuales
10.
Eur J Epidemiol ; 38(10): 1043-1052, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37555907

RESUMEN

Periodic revisions of the international classification of diseases (ICD) ensure that the classification reflects new practices and knowledge; however, this complicates retrospective research as diagnoses are coded in different versions. For longitudinal disease trajectory studies, a crosswalk is an essential tool and a comprehensive mapping between ICD-8 and ICD-10 has until now been lacking. In this study, we map all ICD-8 morbidity codes to ICD-10 in the expanded Danish ICD version. We mapped ICD-8 codes to ICD-10, using a many-to-one system inspired by general equivalence mappings such that each ICD-8 code maps to a single ICD-10 code. Each ICD-8 code was manually and unidirectionally mapped to a single ICD-10 code based on medical setting and context. Each match was assigned a score (1 of 4 levels) reflecting the quality of the match and, if applicable, a "flag" signalling choices made in the mapping. We provide the first complete mapping of the 8596 ICD-8 morbidity codes to ICD-10 codes. All Danish ICD-8 codes representing diseases were mapped and 5106 (59.4%) achieved the highest consistency score. Only 334 (3.9%) of the ICD-8 codes received the lowest mapping consistency score. The mapping provides a scaffold for translation of ICD-8 to ICD-10, which enable longitudinal disease studies back to and 1969 in Denmark and to 1965 internationally with further adaption.

11.
Epilepsy Behav ; 142: 109190, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37011527

RESUMEN

Our study assessed diffusion tensor imaging (DTI) metrics of fractional anisotropy (FA), mean diffusivity (MD), and radial diffusivity (RD) in pediatric subjects with epilepsy secondary to Focal Cortical Dysplasia (FCD) to improve our understanding of structural network changes associated with FCD related epilepsy. We utilized a data harmonization (DH) approach to minimize confounding effects induced by MRI protocol differences. We also assessed correlations between DTI metrics and neurocognitive measures of the fluid reasoning index (FRI), verbal comprehension index (VCI), and visuospatial index (VSI). Data (n = 51) from 23 FCD patients and 28 typically developing controls (TD) scanned clinically on either 1.5T, 3T, or 3T-wide-bore MRI were retrospectively analyzed. Tract-based spatial statistics (TBSS) with threshold-free cluster enhancement and permutation testing with 100,000 permutations were used for statistical analysis. To account for imaging protocol differences, we employed non-parametric data harmonization prior to permutation testing. Our analysis demonstrates that DH effectively removed MRI protocol-based differences typical in clinical acquisitions while preserving group differences in DTI metrics between FCD and TD subjects. Furthermore, DH strengthened the association between DTI metrics and neurocognitive indices. Fractional anisotropy, MD, and RD metrics showed stronger correlation with FRI and VSI than VCI. Our results demonstrate that DH is an integral step to reduce the confounding effect of MRI protocol differences during the analysis of white matter tracts and highlights biological differences between FCD and healthy control subjects. Characterization of white matter changes associated with FCD-related epilepsy may better inform prognosis and treatment approaches.


Asunto(s)
Epilepsia , Displasia Cortical Focal , Sustancia Blanca , Humanos , Niño , Imagen de Difusión Tensora/métodos , Sustancia Blanca/diagnóstico por imagen , Estudios Retrospectivos , Anisotropía , Encéfalo/diagnóstico por imagen
12.
BMC Pregnancy Childbirth ; 23(1): 128, 2023 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-36855094

RESUMEN

BACKGROUND: As a teratogen, alcohol exposure during pregnancy can impact fetal development and result in adverse birth outcomes. Despite the clinical and social importance of prenatal alcohol use, limited routinely collected information or epidemiological data exists in Canada. The aim of this study was to pool data from multiple Canadian cohort studies to identify sociodemographic characteristics before and during pregnancy that were associated with alcohol consumption during pregnancy and to assess the impact of different patterns of alcohol use on birth outcomes. METHODS: We harmonized information collected (e.g., pregnant women's alcohol intake, infants' gestational age and birth weight) from five Canadian pregnancy cohort studies to consolidate a large sample (n = 11,448). Risk factors for any alcohol use during pregnancy, including any alcohol use prior to pregnancy recognition, and binge drinking, were estimated using binomial regressions including fixed effects of pregnancy cohort membership and multiple maternal risk factors. Impacts of alcohol use during pregnancy on birth outcomes (preterm birth and low birth weight for gestational) were also estimated using binomial regression models. RESULTS: In analyses adjusting for multiple risk factors, women's alcohol use during pregnancy, both any use and any binge drinking, was associated with drinking prior to pregnancy, smoking during pregnancy, and white ethnicity. Higher income level was associated with any drinking during pregnancy. Neither drinking during pregnancy nor binge drinking during pregnancy was significantly associated with preterm delivery or low birth weight for gestational age in our sample. CONCLUSIONS: Pooling data across pregnancy cohort studies allowed us to create a large sample of Canadian women and investigate the risk factors for alcohol consumption during pregnancy. We suggest that future pregnancy and birth cohorts should always include questions related to the frequency and amount of alcohol consumed before and during pregnancy that are prospectively harmonized to support data reusability and collaborative research.


Asunto(s)
Consumo Excesivo de Bebidas Alcohólicas , Nacimiento Prematuro , Efectos Tardíos de la Exposición Prenatal , Recién Nacido , Embarazo , Lactante , Femenino , Humanos , Resultado del Embarazo/epidemiología , Nacimiento Prematuro/epidemiología , Nacimiento Prematuro/etiología , Consumo Excesivo de Bebidas Alcohólicas/epidemiología , Canadá/epidemiología , Efectos Tardíos de la Exposición Prenatal/epidemiología , Estudios de Cohortes , Etanol
13.
J Med Internet Res ; 25: e45599, 2023 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-37467026

RESUMEN

BACKGROUND: Cardiovascular disease accounts for 17.9 million deaths globally each year. Many research study data sets have been collected to answer questions regarding the relationship between cardiometabolic health and accelerometer-measured physical activity. This scoping review aimed to map the available data sets that have collected accelerometer-measured physical activity and cardiometabolic health markers. These data were then used to inform the development of a publicly available resource, the Global Physical Activity Data set (GPAD) catalogue. OBJECTIVE: This review aimed to systematically identify data sets that have measured physical activity using accelerometers and cardiometabolic health markers using either an observational or interventional study design. METHODS: Databases, trial registries, and gray literature (inception until February 2021; updated search from February 2021 to September 2022) were systematically searched to identify studies that analyzed data sets of physical activity and cardiometabolic health outcomes. To be eligible for inclusion, data sets must have measured physical activity using an accelerometric device in adults aged ≥18 years; a sample size >400 participants (unless recruited participants in a low- and middle-income country where a sample size threshold was reduced to 100); used an observational, longitudinal, or trial-based study design; and collected at least 1 cardiometabolic health marker (unless only body mass was measured). Two reviewers screened the search results to identify eligible studies, and from these, the unique names of each data set were recorded, and characteristics about each data set were extracted from several sources. RESULTS: A total of 17,391 study reports were identified, and after screening, 319 were eligible, with 122 unique data sets in these study reports meeting the review inclusion criteria. Data sets were found in 49 countries across 5 continents, with the most developed in Europe (n=53) and the least in Africa and Oceania (n=4 and n=3, respectively). The most common accelerometric brand and device wear location was Actigraph and the waist, respectively. Height and body mass were the most frequently measured cardiometabolic health markers in the data sets (119/122, 97.5% data sets), followed by blood pressure (82/122, 67.2% data sets). The number of participants in the included data sets ranged from 103,712 to 120. Once the review processes had been completed, the GPAD catalogue was developed to house all the identified data sets. CONCLUSIONS: This review identified and mapped the contents of data sets from around the world that have collected potentially harmonizable accelerometer-measured physical activity and cardiometabolic health markers. The GPAD catalogue is a web-based open-source resource developed from the results of this review, which aims to facilitate the harmonization of data sets to produce evidence that will reduce the burden of disease from physical inactivity.


Asunto(s)
Enfermedades Cardiovasculares , Ejercicio Físico , Adulto , Humanos , Adolescente , Ejercicio Físico/fisiología , Enfermedades Cardiovasculares/prevención & control , Presión Sanguínea , Acelerometría , Europa (Continente) , Estudios Observacionales como Asunto
14.
BMC Med Inform Decis Mak ; 23(Suppl 1): 151, 2023 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-37542312

RESUMEN

BACKGROUND: In the United States, the National Alzheimer's Coordinating Center (NACC) and the Alzheimer's Disease Neuroimaging Initiative (ADNI) are two major data sharing resources for Alzheimer's Disease (AD) research. NACC and ADNI strive to make their data more FAIR (findable, interoperable, accessible and reusable) for the broader research community. However, there is limited work harmonizing and supporting cross-cohort interoperability of the two resources. METHOD: In this paper, we leverage an ontology-based approach to harmonize data elements in the two resources and develop a web-based query system to search patient cohorts across the two resources. We first mapped data elements across NACC and ADNI, and performed value harmonization for the mapped data elements with inconsistent permissible values. Then we built an Alzheimer's Disease Data Element Ontology (ADEO) to model the mapped data elements in NACC and ADNI. We further developed a prototype cross-cohort query system to search patient cohorts across NACC and ADNI. RESULTS: After manual review, we found 172 mappings between NACC and ADNI. These 172 mappings were further used to construct common concepts in ADEO. Our data element mapping and harmonization resulted in five files storing common concepts, variables in NACC and ADNI, mappings between variables and common concepts, permissible values of categorical type data elements, and coding inconsistency harmonization, respectively. Our cross-cohort query system consists of three core architectural elements: a web-based interface, an advanced query engine, and a backend MongoDB database. CONCLUSIONS: In this work, ADEO has been specifically designed to facilitate data harmonization and cross-cohort query of NACC and ADNI data resources. Although our prototype cross-cohort query system was developed for exploring NACC and ADNI, its backend and frontend framework has been designed and implemented to be generally applicable to other domains for querying patient cohorts from multiple heterogeneous data sources.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Estados Unidos , Enfermedad de Alzheimer/diagnóstico por imagen , Neuroimagen
15.
Prev Sci ; 24(8): 1595-1607, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36441362

RESUMEN

Combining datasets in an integrative data analysis (IDA) requires researchers to make a number of decisions about how best to harmonize item responses across datasets. This entails two sets of steps: logical harmonization, which involves combining items which appear similar across datasets, and analytic harmonization, which involves using psychometric models to find and account for cross-study differences in measurement. Embedded in logical and analytic harmonization are many decisions, from deciding whether items can be combined prima facie to how best to find covariate effects on specific items. Researchers may not have specific hypotheses about these decisions, and each individual choice may seem arbitrary, but the cumulative effects of these decisions are unknown. In the current study, we conducted an IDA of the relationship between alcohol use and delinquency using three datasets (total N = 2245). For analytic harmonization, we used moderated nonlinear factor analysis (MNLFA) to generate factor scores for delinquency. We conducted both logical and analytic harmonization 72 times, each time making a different set of decisions. We assessed the cumulative influence of these decisions on MNLFA parameter estimates, factor scores, and estimates of the relationship between delinquency and alcohol use. There were differences across paths in MNLFA parameter estimates, but fewer differences in estimates of factor scores and regression parameters linking delinquency to alcohol use. These results suggest that factor scores may be relatively robust to subtly different decisions in data harmonization, and measurement model parameters are less so.


Asunto(s)
Consumo de Bebidas Alcohólicas , Análisis de Datos , Humanos , Psicometría , Análisis Factorial
16.
Alzheimers Dement ; 19(8): 3365-3378, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-36790027

RESUMEN

INTRODUCTION: Sex differences in dementia risk, and risk factor (RF) associations with dementia, remain uncertain across diverse ethno-regional groups. METHODS: A total of 29,850 participants (58% women) from 21 cohorts across six continents were included in an individual participant data meta-analysis. Sex-specific hazard ratios (HRs), and women-to-men ratio of hazard ratios (RHRs) for associations between RFs and all-cause dementia were derived from mixed-effect Cox models. RESULTS: Incident dementia occurred in 2089 (66% women) participants over 4.6 years (median). Women had higher dementia risk (HR, 1.12 [1.02, 1.23]) than men, particularly in low- and lower-middle-income economies. Associations between longer education and former alcohol use with dementia risk (RHR, 1.01 [1.00, 1.03] per year, and 0.55 [0.38, 0.79], respectively) were stronger for men than women; otherwise, there were no discernible sex differences in other RFs. DISCUSSION: Dementia risk was higher in women than men, with possible variations by country-level income settings, but most RFs appear to work similarly in women and men.


Asunto(s)
Demencia , Caracteres Sexuales , Humanos , Masculino , Femenino , Factores de Riesgo , Consumo de Bebidas Alcohólicas , Demencia/epidemiología , Factores Sexuales
17.
Statistics (Ber) ; 57(5): 987-1009, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38283617

RESUMEN

The design of multi-center study is increasingly used for borrowing strength from multiple research groups to obtain broadly applicable and reproducible study findings. Regression analysis is widely used for analyzing multi-group studies, however, some of the large number of regression predictors are nonlinear and/or often measured with batch effects in many large scale collaborative studies. Also, the group compositions of the nonlinear predictors are potentially heterogeneous across different centers. The conventional pooled data analysis ignores the interplay between nonlinearity and batch effect, group composition heterogeneity, measurement error and other data incoherence in multi-center setting that can cause biased regression estimates and misleading outcomes. In this paper, we propose an integrated partially linear regression model (IPLM) based analysis to account for the predictor's nonlinearity, general batch effect, group composition heterogeneity, high-dimensional covariates, potential measurement-error in covariates, and combinations of these complexities simultaneously. A local linear regression based approach is employed to estimate the nonlinear component and a regularization procedure is introduced to identify the predictors' effects that can be either homogeneous or heterogeneous across groups. In particular, when the effects of all predictors are homogeneous across the study centers, the proposed IPLM can automatically reduce to one single parsimonious partially linear model for all centers. The proposed method has asymptotic estimation and variable selection consistency including high-dimensional covariates. Moreover, it has a fast computing algorithm and its effectiveness is supported by numerical simulation studies. A multi-center Alzheimer's disease research project is provided to illustrate the proposed IPLM based analysis.

18.
BMC Bioinformatics ; 23(Suppl 12): 386, 2022 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-36151511

RESUMEN

BACKGROUND: Public Data Commons (PDC) have been highlighted in the scientific literature for their capacity to collect and harmonize big data. On the other hand, local data commons (LDC), located within an institution or organization, have been underrepresented in the scientific literature, even though they are a critical part of research infrastructure. Being closest to the sources of data, LDCs provide the ability to collect and maintain the most up-to-date, high-quality data within an organization, closest to the sources of the data. As a data provider, LDCs have many challenges in both collecting and standardizing data, moreover, as a consumer of PDC, they face problems of data harmonization stemming from the monolithic harmonization pipeline designs commonly adapted by many PDCs. Unfortunately, existing guidelines and resources for building and maintaining data commons exclusively focus on PDC and provide very little information on LDC. RESULTS: This article focuses on four important observations. First, there are three different types of LDC service models that are defined based on their roles and requirements. These can be used as guidelines for building new LDC or enhancing the services of existing LDC. Second, the seven core services of LDC are discussed, including cohort identification and facilitation of genomic sequencing, the management of molecular reports and associated infrastructure, quality control, data harmonization, data integration, data sharing, and data access control. Third, instead of commonly developed monolithic systems, we propose a new data sharing method for data harmonization that combines both divide-and-conquer and bottom-up approaches. Finally, an end-to-end LDC implementation is introduced with real-world examples. CONCLUSIONS: Although LDCs are an optimal place to identify and address data quality issues, they have traditionally been relegated to the role of passive data provider for much larger PDC. Indeed, many LDCs limit their functions to only conducting routine data storage and transmission tasks due to a lack of information on how to design, develop, and improve their services using limited resources. We hope that this work will be the first small step in raising awareness among the LDCs of their expanded utility and to publicize to a wider audience the importance of LDC.


Asunto(s)
Macrodatos , Difusión de la Información , Países en Desarrollo , Humanos
19.
BMC Genomics ; 23(1): 156, 2022 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-35193494

RESUMEN

BACKGROUND: Patient-derived xenografts (PDX) mice models play an important role in preclinical trials and personalized medicine. Sharing data on the models is highly valuable for numerous reasons - ethical, economical, research cross validation etc. The EurOPDX Consortium was established 8 years ago to share such information and avoid duplicating efforts in developing new PDX mice models and unify approaches to support preclinical research. EurOPDX Data Portal is the unified data sharing platform adopted by the Consortium. MAIN BODY: In this paper we describe the main features of the EurOPDX Data Portal ( https://dataportal.europdx.eu/ ), its architecture and possible utilization by researchers who look for PDX mice models for their research. The Portal offers a catalogue of European models accessible on a cooperative basis. The models are searchable by metadata, and a detailed view provides molecular profiles (gene expression, mutation, copy number alteration) and treatment studies. The Portal displays the data in multiple tools (PDX Finder, cBioPortal, and GenomeCruzer in future), which are populated from a common database displaying strictly mutually consistent views. (SHORT) CONCLUSION: EurOPDX Data Portal is an entry point to the EurOPDX Research Infrastructure offering PDX mice models for collaborative research, (meta)data describing their features and deep molecular data analysis according to users' interests.


Asunto(s)
Neoplasias , Animales , Xenoinjertos , Humanos , Difusión de la Información , Ratones , Neoplasias/genética , Medicina de Precisión , Ensayos Antitumor por Modelo de Xenoinjerto
20.
Neuroimage ; 261: 119509, 2022 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-35917919

RESUMEN

Results of neuroimaging datasets aggregated from multiple sites may be biased by site-specific profiles in participants' demographic and clinical characteristics, as well as MRI acquisition protocols and scanning platforms. We compared the impact of four different harmonization methods on results obtained from analyses of cortical thickness data: (1) linear mixed-effects model (LME) that models site-specific random intercepts (LMEINT), (2) LME that models both site-specific random intercepts and age-related random slopes (LMEINT+SLP), (3) ComBat, and (4) ComBat with a generalized additive model (ComBat-GAM). Our test case for comparing harmonization methods was cortical thickness data aggregated from 29 sites, which included 1,340 cases with posttraumatic stress disorder (PTSD) (6.2-81.8 years old) and 2,057 trauma-exposed controls without PTSD (6.3-85.2 years old). We found that, compared to the other data harmonization methods, data processed with ComBat-GAM was more sensitive to the detection of significant case-control differences (Χ2(3) = 63.704, p < 0.001) as well as case-control differences in age-related cortical thinning (Χ2(3) = 12.082, p = 0.007). Both ComBat and ComBat-GAM outperformed LME methods in detecting sex differences (Χ2(3) = 9.114, p = 0.028) in regional cortical thickness. ComBat-GAM also led to stronger estimates of age-related declines in cortical thickness (corrected p-values < 0.001), stronger estimates of case-related cortical thickness reduction (corrected p-values < 0.001), weaker estimates of age-related declines in cortical thickness in cases than controls (corrected p-values < 0.001), stronger estimates of cortical thickness reduction in females than males (corrected p-values < 0.001), and stronger estimates of cortical thickness reduction in females relative to males in cases than controls (corrected p-values < 0.001). Our results support the use of ComBat-GAM to minimize confounds and increase statistical power when harmonizing data with non-linear effects, and the use of either ComBat or ComBat-GAM for harmonizing data with linear effects.


Asunto(s)
Imagen por Resonancia Magnética , Trastornos por Estrés Postraumático , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Estudios de Casos y Controles , Niño , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Masculino , Persona de Mediana Edad , Neuroimagen , Adulto Joven
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA