Pesquisa | Portal Regional da BVS

Life course of retrospective harmonization initiatives: key elements to consider.

Fortier, Isabel; Wey, Tina W; Bergeron, Julie; Pinot de Moira, Angela; Nybo-Andersen, Anne-Marie; Bishop, Tom; Murtagh, Madeleine J; Miocevic, Milica; Swertz, Morris A; van Enckevort, Esther; Marcon, Yannick; Mayrhofer, Michaela Th; Ornelas, Jos Pedro; Sebert, Sylvain; Santos, Ana Cristina; Rocha, Artur; Wilson, Rebecca C; Griffith, Lauren E; Burton, Paul.

J Dev Orig Health Dis ; 14(2): 190-198, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-35957574

RESUMO

Optimizing research on the developmental origins of health and disease (DOHaD) involves implementing initiatives maximizing the use of the available cohort study data; achieving sufficient statistical power to support subgroup analysis; and using participant data presenting adequate follow-up and exposure heterogeneity. It also involves being able to undertake comparison, cross-validation, or replication across data sets. To answer these requirements, cohort study data need to be findable, accessible, interoperable, and reusable (FAIR), and more particularly, it often needs to be harmonized. Harmonization is required to achieve or improve comparability of the putatively equivalent measures collected by different studies on different individuals. Although the characteristics of the research initiatives generating and using harmonized data vary extensively, all are confronted by similar issues. Having to collate, understand, process, host, and co-analyze data from individual cohort studies is particularly challenging. The scientific success and timely management of projects can be facilitated by an ensemble of factors. The current document provides an overview of the 'life course' of research projects requiring harmonization of existing data and highlights key elements to be considered from the inception to the end of the project.

Assuntos

Projetos de Pesquisa , Humanos , Estudos de Coortes , Estudos Retrospectivos

Towards an Interoperable Ecosystem of Research Cohort and Real-world Data Catalogues Enabling Multi-center Studies.

Swertz, Morris; van Enckevort, Esther; Oliveira, José Luis; Fortier, Isabel; Bergeron, Julie; Thurin, Nicolas H; Hyde, Eleanor; Kellmann, Alexander; Pahoueshnja, Romin; Sturkenboom, Miriam; Cunnington, Marianne; Nybo Andersen, Anne-Marie; Marcon, Yannick; Gonçalves, Gonçalo; Gini, Rosa.

Yearb Med Inform ; 31(1): 262-272, 2022 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-36463884

RESUMO

OBJECTIVES: Existing individual-level human data cover large populations on many dimensions such as lifestyle, demography, laboratory measures, clinical parameters, etc. Recent years have seen large investments in data catalogues to FAIRify data descriptions to capitalise on this great promise, i.e. make catalogue contents more Findable, Accessible, Interoperable and Reusable. However, their valuable diversity also created heterogeneity, which poses challenges to optimally exploit their richness. METHODS: In this opinion review, we analyse catalogues for human subject research ranging from cohort studies to surveillance, administrative and healthcare records. RESULTS: We observe that while these catalogues are heterogeneous, have various scopes, and use different terminologies, still the underlying concepts seem potentially harmonizable. We propose a unified framework to enable catalogue data sharing, with catalogues of multi-center cohorts nested as a special case in catalogues of real-world data sources. Moreover, we list recommendations to create an integrated community of metadata catalogues and an open catalogue ecosystem to sustain these efforts and maximise impact. CONCLUSIONS: We propose to embrace the autonomy of motivated catalogue teams and invest in their collaboration via minimal standardisation efforts such as clear data licensing, persistent identifiers for linking same records between catalogues, minimal metadata 'common data elements' using shared ontologies, symmetric architectures for data sharing (push/pull) with clear provenance tracks to process updates and acknowledge original contributors. And most importantly, we encourage the creation of environments for collaboration and resource sharing between catalogue developers, building on international networks such as OpenAIRE and research data alliance, as well as domain specific ESFRIs such as BBMRI and ELIXIR.

Assuntos

Elementos de Dados Comuns , Ecossistema , Humanos , Estudos de Coortes , Disseminação de Informação

Worldwide mapping of initiatives that integrate population cohorts.

Rico-Uribe, Laura Alejandra; Morillo-Cuadrado, Daniel; Rodríguez-Laso, Ángel; Vorstenbosch, Ellen; Weser, Andreas J; Fincias, Laura; Marcon, Yannick; Rodriguez-Mañas, Leocadio; Haro, Josep María; Ayuso-Mateos, José Luis.

Front Public Health ; 10: 964086, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36262229

dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning.

Cao, Han; Zhang, Youcheng; Baumbach, Jan; Burton, Paul R; Dwyer, Dominic; Koutsouleris, Nikolaos; Matschinske, Julian; Marcon, Yannick; Rajan, Sivanesan; Rieg, Thilo; Ryser-Welch, Patricia; Späth, Julian; Herrmann, Carl; Schwarz, Emanuel.

Bioinformatics ; 38(21): 4919-4926, 2022 10 31.

Artigo em Inglês | MEDLINE | ID: mdl-36073911

RESUMO

MOTIVATION: In multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTS: Here, we describe the development of 'dsMTL', a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency. AVAILABILITY AND IMPLEMENTATION: dsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado de Máquina , Privacidade , Humanos , Software , Linguagens de Programação , Algoritmos

Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD.

Marcon, Yannick; Bishop, Tom; Avraam, Demetris; Escriba-Montagut, Xavier; Ryser-Welch, Patricia; Wheater, Stuart; Burton, Paul; González, Juan R.

PLoS Comput Biol ; 17(3): e1008880, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33784300

RESUMO

Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers to physically bring data together in one place or follow an analysis plan and share results. Developed over the last 10 years, the DataSHIELD platform is a collection of R packages that reduce the challenges of these methods. These include ethico-legal constraints which limit researchers' ability to physically bring data together and the analytical inflexibility associated with conventional approaches to sharing results. The key feature of DataSHIELD is that data from research studies stay on a server at each of the institutions that are responsible for the data. Each institution has control over who can access their data. The platform allows an analyst to pass commands to each server and the analyst receives results that do not disclose the individual-level data of any study participants. DataSHIELD uses Opal which is a data integration system used by epidemiological studies and developed by the OBiBa open source project in the domain of bioinformatics. However, until now the analysis of big data with DataSHIELD has been limited by the storage formats available in Opal and the analysis capabilities available in the DataSHIELD R packages. We present a new architecture ("resources") for DataSHIELD and Opal to allow large, complex datasets to be used at their original location, in their original format and with external computing facilities. We provide some real big data analysis examples in genomics and geospatial projects. For genomic data analyses, we also illustrate how to extend the resources concept to address specific big data infrastructures such as GA4GH or EGA, and make use of shell commands. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects. To help researchers use this framework, we describe selected packages and present an online book (https://isglobal-brge.github.io/resource_bookdown).

Assuntos

Big Data , Segurança Computacional , Software , Bases de Dados Factuais , Genômica , Sistemas de Informação Geográfica , Humanos

Fostering population-based cohort data discovery: The Maelstrom Research cataloguing toolkit.

Bergeron, Julie; Doiron, Dany; Marcon, Yannick; Ferretti, Vincent; Fortier, Isabel.

PLoS One ; 13(7): e0200926, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30040866

RESUMO

BACKGROUND: The lack of accessible and structured documentation creates major barriers for investigators interested in understanding, properly interpreting and analyzing cohort data and biological samples. Providing the scientific community with open information is essential to optimize usage of these resources. A cataloguing toolkit is proposed by Maelstrom Research to answer these needs and support the creation of comprehensive and user-friendly study- and network-specific web-based metadata catalogues. METHODS: Development of the Maelstrom Research cataloguing toolkit was initiated in 2004. It was supported by the exploration of existing catalogues and standards, and guided by input from partner initiatives having used or pilot tested incremental versions of the toolkit. RESULTS: The cataloguing toolkit is built upon two main components: a metadata model and a suite of open-source software applications. The model sets out specific fields to describe study profiles; characteristics of the subpopulations of participants; timing and design of data collection events; and datasets/variables collected at each data collection event. It also includes the possibility to annotate variables with different classification schemes. When combined, the model and software support implementation of study and variable catalogues and provide a powerful search engine to facilitate data discovery. CONCLUSIONS: The Maelstrom Research cataloguing toolkit already serves several national and international initiatives and the suite of software is available to new initiatives through the Maelstrom Research website. With the support of new and existing partners, we hope to ensure regular improvements of the toolkit.

Assuntos

Estudos de Coortes , Análise de Dados , Bases de Dados Factuais , Estudos Epidemiológicos , Humanos , Modelos Estatísticos , Software , Interface Usuário-Computador

Software Application Profile: Opal and Mica: open-source software solutions for epidemiological data management, harmonization and dissemination.

Doiron, Dany; Marcon, Yannick; Fortier, Isabel; Burton, Paul; Ferretti, Vincent.

Int J Epidemiol ; 46(5): 1372-1378, 2017 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-29025122

RESUMO

Motivation: Improving the dissemination of information on existing epidemiological studies and facilitating the interoperability of study databases are essential to maximizing the use of resources and accelerating improvements in health. To address this, Maelstrom Research proposes Opal and Mica, two inter-operable open-source software packages providing out-of-the-box solutions for epidemiological data management, harmonization and dissemination. Implementation: Opal and Mica are two standalone but inter-operable web applications written in Java, JavaScript and PHP. They provide web services and modern user interfaces to access them. General features: Opal allows users to import, manage, annotate and harmonize study data. Mica is used to build searchable web portals disseminating study and variable metadata. When used conjointly, Mica users can securely query and retrieve summary statistics on geographically dispersed Opal servers in real-time. Integration with the DataSHIELD approach allows conducting more complex federated analyses involving statistical models. Availability: Opal and Mica are open-source and freely available at [www.obiba.org] under a General Public License (GPL) version 3, and the metadata models and taxonomies that accompany them are available under a Creative Commons licence.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Disseminação de Informação/métodos , Software , Canadá , Estudos Epidemiológicos , Humanos , Internet

DataSHIELD: taking the analysis to the data, not the data to the analysis.

Gaye, Amadou; Marcon, Yannick; Isaeva, Julia; LaFlamme, Philippe; Turner, Andrew; Jones, Elinor M; Minion, Joel; Boyd, Andrew W; Newby, Christopher J; Nuotio, Marja-Liisa; Wilson, Rebecca; Butters, Oliver; Murtagh, Barnaby; Demir, Ipek; Doiron, Dany; Giepmans, Lisette; Wallace, Susan E; Budin-Ljøsne, Isabelle; Oliver Schmidt, Carsten; Boffetta, Paolo; Boniol, Mathieu; Bota, Maria; Carter, Kim W; deKlerk, Nick; Dibben, Chris; Francis, Richard W; Hiekkalinna, Tero; Hveem, Kristian; Kvaløy, Kirsti; Millar, Sean; Perry, Ivan J; Peters, Annette; Phillips, Catherine M; Popham, Frank; Raab, Gillian; Reischl, Eva; Sheehan, Nuala; Waldenberger, Melanie; Perola, Markus; van den Heuvel, Edwin; Macleod, John; Knoppers, Bartha M; Stolk, Ronald P; Fortier, Isabel; Harris, Jennifer R; Woffenbuttel, Bruce H R; Murtagh, Madeleine J; Ferretti, Vincent; Burton, Paul R.

Int J Epidemiol ; 43(6): 1929-44, 2014 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-25261970

RESUMO

BACKGROUND: Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK's proposed 'care.data' initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data. METHODS: Commands are sent from a central analysis computer (AC) to several data computers (DCs) storing the data to be co-analysed. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands transmitted back and forth between the DCs and the AC. This paper describes the technical implementation of DataSHIELD using a modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is controlled through a standard R environment at the AC. RESULTS: Based on this Opal/R implementation, DataSHIELD is currently used by the Healthy Obese Project and the Environmental Core Project (BioSHaRE-EU) for the federated analysis of 10 data sets across eight European countries, and this illustrates the opportunities and challenges presented by the DataSHIELD approach. CONCLUSIONS: DataSHIELD facilitates important research in settings where: (i) a co-analysis of individual-level data from several studies is scientifically necessary but governance restrictions prohibit the release or sharing of some of the required data, and/or render data access unacceptably slow; (ii) a research group (e.g. in a developing nation) is particularly vulnerable to loss of intellectual property-the researchers want to fully share the information held in their data with national and international collaborators, but do not wish to hand over the physical data themselves; and (iii) a data set is to be included in an individual-level co-analysis but the physical size of the data precludes direct transfer to a new site for analysis.

Assuntos

Pesquisa Biomédica , Segurança Computacional , Confidencialidade , Conjuntos de Dados como Assunto , Armazenamento e Recuperação da Informação , Biologia Computacional , Bases de Dados Factuais , Humanos , Reino Unido

Data harmonization and federated analysis of population-based studies: the BioSHaRE project.

Doiron, Dany; Burton, Paul; Marcon, Yannick; Gaye, Amadou; Wolffenbuttel, Bruce H R; Perola, Markus; Stolk, Ronald P; Foco, Luisa; Minelli, Cosetta; Waldenberger, Melanie; Holle, Rolf; Kvaløy, Kirsti; Hillege, Hans L; Tassé, Anne-Marie; Ferretti, Vincent; Fortier, Isabel.

Emerg Themes Epidemiol ; 10(1): 12, 2013 Nov 21.

Artigo em Inglês | MEDLINE | ID: mdl-24257327

RESUMO

BACKGROUND: Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses. METHODS: Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study's questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis. RESULTS: Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method. CONCLUSION: New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA