RESUMO
Introduction: The expansion of electronic health record (EHR) data networks over the last two decades has significantly improved the accessibility and processes around data sharing. However, there lies a gap in meeting the needs of Clinical and Translational Science Award (CTSA) hubs, particularly related to real-world data (RWD) and real-world evidence (RWE). Methods: We adopted a mixed-methods approach to construct a comprehensive needs assessment that included: (1) A Landscape Context analysis to understand the competitive environment; and (2) Customer Discovery to identify stakeholders and the value proposition related to EHR data networks. Methods included surveys, interviews, and a focus group. Results: Thirty-two CTSA institutions contributed data for analysis. Fifty-four interviews and one focus group were conducted. The synthesis of our findings pivots around five emergent themes: (1) CTSA segmentation needs vary according to resources; (2) Team science is key for success; (3) Quality of data generates trust in the network; (4) Capacity building is defined differently by researcher career stage and CTSA existing resources; and (5) Researchers' unmet needs. Conclusions: Based on the results, EHR data networks like ENACT that would like to meet the expectations of academic research centers within the CTSA consortium need to consider filling the gaps identified by our study: foster team science, improve workforce capacity, achieve data governance trust and efficiency of operation, and aid Learning Health Systems with validating, applying, and scaling the evidence to support quality improvement and high-value care. These findings align with the NIH NCATS Strategic Plan for Data Science.
RESUMO
Recent advancements in protein structure determination and especially in protein structure prediction techniques have led to the availability of vast amounts of macromolecular structures. However, the accessibility and integration of these structures into scientific workflows are hindered by the lack of standardization among publicly available data resources. To address this issue, we introduced the 3D-Beacons Network, a unified platform that aims to establish a standardized framework for accessing and displaying protein structure data. In this article, we highlight the importance of standardized approaches for accessing protein structure data and showcase the capabilities of 3D-Beacons. We describe four protocols for finding and accessing macromolecular structures from various specialist data resources via 3D-Beacons. First, we describe three scenarios for programmatically accessing and retrieving data using the 3D-Beacons API. Next, we show how to perform sequence-based searches to find structures from model providers. Then, we demonstrate how to search for structures and fetch them directly into a workflow using JalView. Finally, we outline the process of facilitating access to data from providers interested in contributing their structures to the 3D-Beacons Network. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Programmatic access to the 3D-Beacons API Basic Protocol 2: Sequence-based search using the 3D-Beacons API Basic Protocol 3: Accessing macromolecules from 3D-Beacons with JalView Basic Protocol 4: Enhancing data accessibility through 3D-Beacons.
Assuntos
Conformação Proteica , Proteínas , Proteínas/química , Bases de Dados de Proteínas , SoftwareRESUMO
BACKGROUND: The Big Multiple Sclerosis Data (BMSD) network ( https://bigmsdata.org ) was initiated in 2014 and includes the national multiple sclerosis (MS) registries of the Czech Republic, Denmark, France, Italy, and Sweden as well as the international MSBase registry. BMSD has addressed the ethical, legal, technical, and governance-related challenges for data sharing and so far, published three scientific papers on pooled datasets as proof of concept for its collaborative design. DATA COLLECTION: Although BMSD registries operate independently on different platforms, similarities in variables, definitions and data structure allow joint analysis of data. Certain coordinated modifications in how the registries collect adverse event data have been implemented after BMSD consensus decisions, showing the ability to develop together. DATA MANAGEMENT: Scientific projects can be proposed by external sponsors via the coordinating centre and each registry decides independently on participation, respecting its governance structure. Research datasets are established in a project-to-project fashion and a project-specific data model is developed, based on a unifying core data model. To overcome challenges in data sharing, BMSD has developed procedures for federated data analysis. FUTURE PERSPECTIVES: Presently, BMSD is seeking a qualification opinion from the European Medicines Agency (EMA) to conduct post-authorization safety studies (PASS) and aims to pursue a qualification opinion also for post-authorization effectiveness studies (PAES). BMSD aspires to promote the advancement of real-world evidence research in the MS field.
Assuntos
Esclerose Múltipla , Sistema de Registros , Humanos , Big Data , Disseminação de Informação , Cooperação Internacional , Esclerose Múltipla/epidemiologia , Esclerose Múltipla/terapiaRESUMO
PURPOSE: In rare diseases, real-world evidence (RWE) generation is often restricted due to small patient numbers and global geographic distribution. A federated data network (FDN) approach brings together multiple data sources harmonized for collaboration to increase the power of observational research. In this paper, we review how to increase reproducibility and transparency of RWE studies in rare diseases through disease-specific FDNs. METHOD: To be successful, a multiple stakeholder scientific FDN collaboration requires a strong governance model in place. In such a model, each database owner remains in full control regarding the use of and access to patient-level data and is responsible for data privacy, ethical, and legal compliance. Provided that all this is well documented and good database descriptions are in place, such a governance model results in increased transparency, while reproducibility is achieved through data curation and harmonization, and distributed analytical methods. RESULTS: Leveraging the OHDSI community set of methods and tools, two rare disease-specific FDNs are discussed in more detail. For multiple myeloma, HONEUR-the Haematology Outcomes Network in Europe-has built a strong community among the data partners dedicated to scientific exchange and research. To advance scientific knowledge in pulmonary hypertension (PH) an FDN, called PHederation, was established to form a partnership of research institutions with PH databases coming from diverse origins.
Assuntos
Doenças Raras , Humanos , Doenças Raras/epidemiologia , Reprodutibilidade dos Testes , Bases de Dados Factuais , Europa (Continente)RESUMO
BACKGROUND: Data harmonisation is essential in real-world data (RWD) research projects based on hospital information systems databases, as coding systems differ between countries. The Hungarian hospital information systems and the national claims database use internationally known diagnosis codes, but data on medical procedures are recorded using national codes. There is no simple or standard solution for mapping the national codes to a standard coding system. Our aim was to map the Hungarian procedure codes (OENO) to SNOMED CT as part of the European Health Data Evidence Network (EHDEN) project. METHODS: We recruited 25 professionals from different specialties to manually map the procedure codes used between 2011 and 2021. A mapping protocol and training material were developed, results were regularly revised, and the challenges of mapping were recorded. Approximately 7% of the codes were mapped by more people in different specialties for validation purposes. RESULTS: We mapped 4661 OENO codes to standard vocabularies, mostly SNOMED CT. We categorized the challenges into three main areas: semantic, matching, and methodological. Semantic refers to the occasionally unclear meaning of the OENO codes, matching to the different granularity and purpose of the OENO and SNOMED CT vocabularies. Lastly, methodological challenges were used to describe issues related to the design of the above-mentioned two vocabularies. CONCLUSIONS: The challenges and solutions presented here may help other researchers to design their process to map their national codes to standard vocabularies in order to achieve greater consistency in mapping results. Moreover, we believe that our work will allow for better use of RWD collected in Hungary in international research collaborations.
Assuntos
Sistemas Computadorizados de Registros Médicos , Systematized Nomenclature of Medicine , Humanos , Hungria , Registros , Bases de Dados FactuaisRESUMO
Harmonizing medical data sharing frameworks is challenging. Data collection and formats follow local solutions in individual hospitals; thus, interoperability is not guaranteed. The German Medical Informatics Initiative (MII) aims to provide a Germany-wide, federated, large-scale data sharing network. In the last five years, numerous efforts have been successfully completed to implement the regulatory framework and software components for securely interacting with decentralized and centralized data sharing processes. 31 German university hospitals have today established local data integration centers that are connected to the central German Portal for Medical Research Data (FDPG). Here, we present milestones and associated major achievements of various MII working groups and subprojects which led to the current status. Further, we describe major obstacles and the lessons learned during its routine application in the last six months.
Assuntos
Pesquisa Biomédica , Informática Médica , Humanos , Disseminação de Informação , Software , Hospitais UniversitáriosRESUMO
BACKGROUND: Achieving early and sustained viral suppression (VS) following diagnosis of HIV infection is critical to improving outcomes for persons with HIV (PWH). The Deep South of the United States (US) is a region that is disproportionately impacted by the domestic HIV epidemic. Time to VS, defined as time from diagnosis to initial VS, is substantially longer in the South than other regions of the US. We describe the development and implementation of a distributed data network between an academic institution and state health departments to investigate variation in time to VS in the Deep South. METHODS: Representatives of state health departments, the Centers for Disease Control and Prevention (CDC), and the academic partner met to establish core objectives and procedures at the beginning of the project. Importantly, this project used the CDC-developed Enhanced HIV/AIDS Reporting System (eHARS) through a distributed data network model that maintained the confidentiality and integrity of the data. Software programs to build datasets and calculate time to VS were written by the academic partner and shared with each public health partner. To develop spatial elements of the eHARS data, health departments geocoded residential addresses of each newly diagnosed individual in eHARS between 2012-2019, supported by the academic partner. Health departments conducted all analyses within their own systems. Aggregate results were combined across states using meta-analysis techniques. Additionally, we created a synthetic eHARS data set for code development and testing. RESULTS: The collaborative structure and distributed data network have allowed us to refine the study questions and analytic plans to conduct investigations into variation in time to VS for both research and public health practice. Additionally, a synthetic eHARS data set has been created and is publicly available for researchers and public health practitioners. CONCLUSIONS: These efforts have leveraged the practice expertise and surveillance data within state health departments and the analytic and methodologic expertise of the academic partner. This study could serve as an illustrative example of effective collaboration between academic institutions and public health agencies and provides resources to facilitate future use of the US HIV surveillance system for research and public health practice.
Assuntos
Síndrome da Imunodeficiência Adquirida , Infecções por HIV , Estados Unidos/epidemiologia , Humanos , Infecções por HIV/epidemiologia , Instituições Acadêmicas , Universidades , Centers for Disease Control and Prevention, U.S.RESUMO
BACKGROUND: The goal of therapy in type 1 diabetes (T1D) is to achieve optimal glycaemic targets and reduce complications. Robust data representing glycaemic outcomes across the lifespan are lacking in Australasia. AIMS: To examine contemporary glycaemic outcomes and rate of use of diabetes technologies in Australasian people with T1D. METHODS: Cross-sectional analysis of de-identified data from 18 diabetes centres maintained in the Australasian Diabetes Data Network registry during 2019. Glycaemia was measured using glycated haemoglobin (HbA1c). The proportion of people with T1D achieving the international HbA1c target of <53 mmol/mol (7%) was calculated. Rates of continuous subcutaneous insulin infusion (CSII) and continuous glucose monitoring (CGM) use were determined. RESULTS: A total of 7988 individuals with T1D with 30 575 visits were recorded in the registry. The median (interquartile range) age was 15.3 (10.0) years and diabetes duration was 5.7 (9.4) years with 49% on multiple daily injections (MDI) and 36% on CSII. The mean HbA1c for the whole cohort was 66 mmol/mol (8.2%). HbA1c increased with age, from 60 mmol/mol (7.6%) in children <10 years, increasing during adolescence and peaking at 73 mmol/mol (8.8%) in the 20-25 years age group. The HbA1c target of <53 mmol/mol (7%) was met in 18% of children and 13% of adults. HbA1c was lower on CSII as compared with those on MDI (P < 0.0001). CONCLUSIONS: Only a minority of children and adults achieve the recommended glycaemic goals despite access to specialist care in major diabetes centres. There is a need to identify factors that improve glycaemic outcomes.
Assuntos
Diabetes Mellitus Tipo 1 , Adolescente , Humanos , Criança , Adulto , Diabetes Mellitus Tipo 1/tratamento farmacológico , Diabetes Mellitus Tipo 1/epidemiologia , Hipoglicemiantes/uso terapêutico , Hemoglobinas Glicadas , Automonitorização da Glicemia , Estudos Transversais , Glicemia , Sistemas de Infusão de Insulina , Insulina/uso terapêuticoRESUMO
PURPOSE: To conceptualize a particular target population and estimand for multi-site pharmacoepidemiologic studies within data networks and to analytically examine sample-standardization as a meta-analytic method compared with inverse-variance weighted meta-analyses. METHODS: The target population of interest is all and only all individuals from the data-contributing sites. Standardization, a general conditioning technique frequently employed for confounding control, was adopted to estimate the network-wide causal treatment effect. Specifically, the proposed sample-standardization yields a meta-analysis estimator, that is, a weighted summation of site-specific results, where the weight for a site is the proportion of its size in the entire network. This sample-standardization estimator was evaluated analytically in comparison to estimators from inverse-variance weighted fixed-effect and random-effects meta-analyses in terms of statistical consistency. RESULTS: A proof is reported to justify the consistency of the sample-standardization estimator with and without treatment effect heterogeneity by site. Both inverse-variance weighted fixed-effect and random-effects meta-analyses were found to generally result in inconsistent estimators in the presence of treatment effect heterogeneity by site for this particular target population and estimand. CONCLUSIONS: Sample-standardization is a valid approach to generate causal inference in multi-site studies when the target population comprises all and only all individuals within the network, even in the presence of heterogeneity of treatment effect by site. Multi-site studies should clearly specify the target population and estimand to help select the most appropriate meta-analytic methods.
Assuntos
Modelos Estatísticos , Humanos , Causalidade , Padrões de Referência , Simulação por ComputadorRESUMO
OBJECTIVES: The aim of this work is to demonstrate the use of a standardized health informatics framework to generate reliable and reproducible real-world evidence from Latin America and South Asia towards characterizing coronavirus disease 2019 (COVID-19) in the Global South. MATERIALS AND METHODS: Patient-level COVID-19 records collected in a patient self-reported notification system, hospital in-patient and out-patient records, and community diagnostic labs were harmonized to the Observational Medical Outcomes Partnership common data model and analyzed using a federated network analytics framework. Clinical characteristics of individuals tested for, diagnosed with or tested positive for, hospitalized with, admitted to intensive care unit with, or dying with COVID-19 were estimated. RESULTS: Two COVID-19 databases covering 8.3 million people from Pakistan and 2.6 million people from Bahia, Brazil were analyzed. 109 504 (Pakistan) and 921 (Brazil) medical concepts were harmonized to Observational Medical Outcomes Partnership common data model. In total, 341 505 (4.1%) people in the Pakistan dataset and 1 312 832 (49.2%) people in the Brazilian dataset were tested for COVID-19 between January 1, 2020 and April 20, 2022, with a median [IQR] age of 36 [25, 76] and 38 (27, 50); 40.3% and 56.5% were female in Pakistan and Brazil, respectively. 1.2% percent individuals in the Pakistan dataset had Afghan ethnicity. In Brazil, 52.3% had mixed ethnicity. In agreement with international findings, COVID-19 outcomes were more severe in men, elderly, and those with underlying health conditions. CONCLUSIONS: COVID-19 data from 2 large countries in the Global South were harmonized and analyzed using a standardized health informatics framework developed by an international community of health informaticians. This proof-of-concept study demonstrates a potential open science framework for global knowledge mobilization and clinical translation for timely response to healthcare needs in pandemics and beyond.
Assuntos
COVID-19 , Masculino , Humanos , Feminino , Idoso , COVID-19/epidemiologia , Brasil/epidemiologia , Paquistão/epidemiologia , Unidades de Terapia Intensiva , Atenção à SaúdeRESUMO
Introduction: Real-world evidence (RWE) in health technology assessment (HTA) holds significant potential for informing healthcare decision-making. A multistakeholder workshop was organised by the European Health Data and Evidence Network (EHDEN) and the GetReal Institute to explore the status, challenges, and opportunities in incorporating RWE into HTA, with a focus on learning from regulatory initiatives such as the European Medicines Agency (EMA) Data Analysis and Real World Interrogation Network (DARWIN EU®). Methods: The workshop gathered key stakeholders from regulatory agencies, HTA organizations, academia, and industry for three panel discussions on RWE and HTA integration. Insights and recommendations were collected through panel discussions and audience polls. The workshop outcomes were reviewed by authors to identify key themes, challenges, and recommendations. Results: The workshop discussions revealed several important findings relating to the use of RWE in HTA. Compared with regulatory processes, its adoption in HTA to date has been slow. Barriers include limited trust in RWE, data quality concerns, and uncertainty about best practices. Facilitators include multidisciplinary training, educational initiatives, and stakeholder collaboration, which could be facilitated by initiatives like EHDEN and the GetReal Institute. Demonstrating the impact of "driver projects" could promote RWE adoption in HTA. Conclusion: To enhance the integration of RWE in HTA, it is crucial to address known barriers through comprehensive training, stakeholder collaboration, and impactful exemplar research projects. By upskilling users and beneficiaries of RWE and those that generate it, promoting collaboration, and conducting "driver projects," can strengthen the HTA evidence base for more informed healthcare decisions.
RESUMO
While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.
Assuntos
Metadados , Registros , Sequência de Aminoácidos , Bases de Dados de Proteínas , Simulação por ComputadorRESUMO
AIMS: Islet autoantibody screening of infants and young children in the Northern Hemisphere, together with semi-annual metabolic monitoring, is associated with a lower risk of ketoacidosis (DKA) and improved glucose control after diagnosis of clinical (stage 3) type 1 diabetes (T1D). We aimed to determine if similar benefits applied to older Australians and New Zealanders monitored less rigorously. METHODS: DKA occurrence and metabolic control were compared between T1D relatives screened and monitored for T1D and unscreened individuals diagnosed in the general population, ascertained from the Australasian Diabetes Data Network. RESULTS: Between 2005 and 2019, 17,105 relatives (mean (SD) age 15.7 (10.8) years; 52% female) were screened for autoantibodies against insulin, glutamic acid decarboxylase, and insulinoma-associated protein 2. Of these, 652 screened positive to a single and 306 to multiple autoantibody specificities, of whom 201 and 215, respectively, underwent metabolic monitoring. Of 178 relatives diagnosed with stage 3 T1D, 9 (5%) had DKA, 7 of whom had not undertaken metabolic monitoring. The frequency of DKA in the general population was 31%. After correction for age, sex and T1D family history, the frequency of DKA in screened relatives was >80% lower than in the general population. HbA1c and insulin requirements following diagnosis were also lower in screened relatives, consistent with greater beta cell reserve. CONCLUSIONS: T1D autoantibody screening and metabolic monitoring of older children and young adults in Australia and New Zealand, by enabling pre-clinical diagnosis when beta cell reserve is greater, confers protection from DKA. These clinical benefits support ongoing efforts to increase screening activity in the region and should facilitate the application of emerging immunotherapies.
Assuntos
Diabetes Mellitus Tipo 1 , Cetoacidose Diabética , Cetose , Criança , Lactente , Humanos , Feminino , Adolescente , Pré-Escolar , Masculino , Diabetes Mellitus Tipo 1/complicações , Nova Zelândia , Cetoacidose Diabética/epidemiologia , Austrália , Insulina/uso terapêutico , AutoanticorposRESUMO
BACKGROUND: Observational studies incorporating real-world data from multiple institutions facilitate study of rare outcomes or exposures and improve generalizability of results. Due to privacy concerns surrounding patient-level data sharing across institutions, methods for performing regression analyses distributively are desirable. Meta-analysis of institution-specific estimates is commonly used, but has been shown to produce biased estimates in certain settings. While distributed regression methods are increasingly available, methods for analyzing count outcomes are currently limited. Count data in practice are commonly subject to overdispersion, exhibiting greater variability than expected under a given statistical model. OBJECTIVE: We propose a novel computational method, a one-shot distributed algorithm for quasi-Poisson regression (ODAP), to distributively model count outcomes while accounting for overdispersion. METHODS: ODAP incorporates a surrogate likelihood approach to perform distributed quasi-Poisson regression without requiring patient-level data sharing, only requiring sharing of aggregate data from each participating institution. ODAP requires at most three rounds of non-iterative communication among institutions to generate coefficient estimates and corresponding standard errors. In simulations, we evaluate ODAP under several data scenarios possible in multi-site analyses, comparing ODAP and meta-analysis estimates in terms of error relative to pooled regression estimates, considered the gold standard. In a proof-of-concept real-world data analysis, we similarly compare ODAP and meta-analysis in terms of relative error to pooled estimatation using data from the OneFlorida Clinical Research Consortium, modeling length of stay in COVID-19 patients as a function of various patient characteristics. In a second proof-of-concept analysis, using the same outcome and covariates, we incorporate data from the UnitedHealth Group Clinical Discovery Database together with the OneFlorida data in a distributed analysis to compare estimates produced by ODAP and meta-analysis. RESULTS: In simulations, ODAP exhibited negligible error relative to pooled regression estimates across all settings explored. Meta-analysis estimates, while largely unbiased, were increasingly variable as heterogeneity in the outcome increased across institutions. When baseline expected count was 0.2, relative error for meta-analysis was above 5% in 25% of iterations (250/1000), while the largest relative error for ODAP in any iteration was 3.59%. In our proof-of-concept analysis using only OneFlorida data, ODAP estimates were closer to pooled regression estimates than those produced by meta-analysis for all 15 covariates. In our distributed analysis incorporating data from both OneFlorida and the UnitedHealth Group Clinical Discovery Database, ODAP and meta-analysis estimates were largely similar, while some differences in estimates (as large as 13.8%) could be indicative of bias in meta-analytic estimates. CONCLUSIONS: ODAP performs privacy-preserving, communication-efficient distributed quasi-Poisson regression to analyze count outcomes using data stored within multiple institutions. Our method produces estimates nearly matching pooled regression estimates and sometimes more accurate than meta-analysis estimates, most notably in settings with relatively low counts and high outcome heterogeneity across institutions.
Assuntos
COVID-19 , Algoritmos , COVID-19/epidemiologia , Humanos , Funções Verossimilhança , Modelos Estatísticos , Análise de RegressãoRESUMO
INTRODUCTION: There remains a need to optimize treatments and improve outcomes among patients with hematologic malignancies. The timely synthesis and analysis of real-world data could play a key role. OBJECTIVES: The Haematology Outcomes Network in Europe (HONEUR) is a federated data network (FDN) that aims to overcome the challenges of heterogenous data collected from different registries, hospitals, and other databases in different countries. It has the functionality required to analyze data from various sources in a time efficient manner, while preserving local data security and governance. With this, research studies can be performed that can increase knowledge and understanding of the management of patients with hematologic malignancies. METHODS: HONEUR uses the Observational Medical Outcomes Partnership (OMOP) common data model, which allows analysis scripts to be run by multiple sites using their own data, ultimately generating aggregated results. Furthermore, distributed analytics can be used to run statistical analyses across multiple sites, as if data were pooled. The external governance model ensures high-quality standards, while data ownership is retained locally. Twenty partners from nine countries are now participating, with data from more than 26 000 patients available for analysis. Research questions that can be addressed through HONEUR include assessments of natural disease history, treatment patterns, and clinical effectiveness. CONCLUSIONS: The HONEUR FDN marks an important step forward in increasing the value of information routinely captured by individual hospitals, registries and other database holders, thus enabling larger-scale studies to be undertaken rapidly and efficiently.
Assuntos
Neoplasias Hematológicas , Hematologia , Bases de Dados Factuais , Europa (Continente)/epidemiologia , Neoplasias Hematológicas/diagnóstico , Neoplasias Hematológicas/epidemiologia , Neoplasias Hematológicas/terapia , Humanos , Sistema de RegistrosRESUMO
OBJECTIVES: Real-world evidence (RWE) plays an important role in addressing key research questions of interest to healthcare decision makers. Federated data networks (FDNs) apply novel technology to enable the conduct of RWE studies with multiple partners, without the need to share the individual partner's data set. A systematic review of the published literature was performed to determine which types of research questions can best be addressed through FDNs, specifically in the field of oncology. METHODS: Systematic searches of MEDLINE and Embase were undertaken to identify the types of research questions that had been addressed in studies using FDNs. Additional information was retrieved about study characteristics, statistical methods, and the FDN itself. RESULTS: In total, 40 publications were included where research questions on the following had been addressed (multiple categories possible): disease natural history (58%), safety surveillance (18%), treatment pathways (15%), comparative effectiveness (10%), and cost/resource use studies (3%)-13% of studies had to be left uncategorized. A total of 50% of the studies were run with data partners in networks of ≤5. The size of the networks ranged from 227 patients to >5 million patients. Statistical methods used included distributed learning and distributed regression methods. CONCLUSIONS: Further work is needed to raise awareness of the important role that FDNs can play in leveraging readily available RWE to address key research questions of interest in cancer and the benefits to the research community in engaging in federated data initiatives with a long-term perspective.
Assuntos
Oncologia , Neoplasias , Coleta de Dados , Humanos , Neoplasias/terapiaRESUMO
BACKGROUND AND OBJECTIVE: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). METHODS: We show step-by-step how to implement the analytics pipeline for the question: 'In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?'. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. RESULTS: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. CONCLUSION: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world.
Assuntos
COVID-19 , Pandemias , Humanos , Modelos Logísticos , Aprendizado de Máquina , SARS-CoV-2RESUMO
Clinical data networks that leverage large volumes of data in electronic health records (EHRs) are significant resources for research on coronavirus disease 2019 (COVID-19). Data harmonization is a key challenge in seamless use of multisite EHRs for COVID-19 research. We developed a COVID-19 application ontology in the national Accrual to Clinical Trials (ACT) network that enables harmonization of data elements that are critical to COVID-19 research. The ontology contains over 50 000 concepts in the domains of diagnosis, procedures, medications, and laboratory tests. In particular, it has computational phenotypes to characterize the course of illness and outcomes, derived terms, and harmonized value sets for severe acute respiratory syndrome coronavirus 2 laboratory tests. The ontology was deployed and validated on the ACT COVID-19 network that consists of 9 academic health centers with data on 14.5M patients. This ontology, which is freely available to the entire research community on GitHub at https://github.com/shyamvis/ACT-COVID-Ontology, will be useful for harmonizing EHRs for COVID-19 research beyond the ACT network.
RESUMO
Clinical data networks that leverage large volumes of data in electronic health records (EHRs) are significant resources for research on coronavirus disease 2019 (COVID-19). Data harmonization is a key challenge in seamless use of multisite EHRs for COVID-19 research. We developed a COVID-19 application ontology in the national Accrual to Clinical Trials (ACT) network that enables harmonization of data elements that that are critical to COVID-19 research. The ontology contains over 50,000 concepts in the domains of diagnosis, procedures, medications, and laboratory tests. In particular, it has computational phenotypes to characterize the course of illness and outcomes, derived terms, and harmonized value sets for SARS-CoV-2 laboratory tests. The ontology was deployed and validated on the ACT COVID-19 network that consists of nine academic health centers with data on 14.5M patients. This ontology, which is freely available to the entire research community on GitHub at https://github.com/shyamvis/ACT-COVID-Ontology, will be useful for harmonizing EHRs for COVID-19 research beyond the ACT network.
RESUMO
OBJECTIVE: To describe PCORnet, a clinical research network developed for patient-centered outcomes research on a national scale. STUDY DESIGN AND SETTING: Descriptive study of the current state and future directions for PCORnet. We conducted cross-sectional analyses of the health systems and patient populations of the 9 Clinical Research Networks and 2 Health Plan Research Networks that are part of PCORnet. RESULTS: Within the Clinical Research Networks, electronic health data are currently collected from 337 hospitals, 169,695 physicians, 3,564 primary care practices, 338 emergency departments, and 1,024 community clinics. Patients can be recruited for prospective studies from any of these clinical sites. The Clinical Research Networks have accumulated data from 80 million patients with at least one visit from 2009 to 2018. The PCORnet Health Plan Research Network population of individuals with a valid enrollment segment from 2009 to 2019 exceeds 60 million individuals, who on average have 2.63 years of follow-up. CONCLUSION: PCORnet's infrastructure comprises clinical data from a diverse cohort of patients and has the capacity to rapidly access these patient populations for pragmatic clinical trials, epidemiological research, and patient-centered research on rare diseases.