Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 41
Filter
1.
Res Sq ; 2024 Jun 05.
Article in English | MEDLINE | ID: mdl-38883709

ABSTRACT

Accurate identification of acute coronary syndrome (ACS) in the prehospital sestting is important for timely treatments that reduce damage to the compromised myocardium. Current machine learning approaches lack sufficient performance to safely rule-in or rule-out ACS. Our goal is to identify a method that bridges this gap. To do so, we retrospectively evaluate two promising approaches, an ensemble of gradient boosted decision trees (GBDT) and selective classification (SC) on consecutive patients transported by ambulance to the ED with chest pain and/or anginal equivalents. On the task of ACS classification with 23 prehospital covariates, we found the fusion of the two (GBDT+SC) improves the best reported sensitivity and specificity by 8% and 23% respectively. Accordingly, GBDT+SC is safer than current machine learning approaches to rule-in and rule-out of ACS in the prehospital setting.

2.
Front Public Health ; 12: 1347862, 2024.
Article in English | MEDLINE | ID: mdl-38737862

ABSTRACT

The COVID-19 pandemic has necessitated the development of robust tools for tracking and modeling the spread of the virus. We present 'K-Track-Covid,' an interactive web-based dashboard developed using the R Shiny framework, to offer users an intuitive dashboard for analyzing the geographical and temporal spread of COVID-19 in South Korea. Our dashboard employs dynamic user interface elements, employs validated epidemiological models, and integrates regional data to offer tailored visual displays. The dashboard allows users to customize their data views by selecting specific time frames, geographic regions, and demographic groups. This customization enables the generation of charts and statistical summaries pertinent to both daily fluctuations and cumulative counts of COVID-19 cases, as well as mortality statistics. Additionally, the dashboard offers a simulation model based on mathematical models, enabling users to make predictions under various parameter settings. The dashboard is designed to assist researchers, policymakers, and the public in understanding the spread and impact of COVID-19, thereby facilitating informed decision-making. All data and resources related to this study are publicly available to ensure transparency and facilitate further research.


Subject(s)
COVID-19 , Internet , Humans , Republic of Korea/epidemiology , COVID-19/epidemiology , SARS-CoV-2 , User-Computer Interface , Pandemics , Epidemiological Models
3.
J Am Med Inform Assoc ; 30(7): 1293-1300, 2023 06 20.
Article in English | MEDLINE | ID: mdl-37192819

ABSTRACT

Research increasingly relies on interrogating large-scale data resources. The NIH National Heart, Lung, and Blood Institute developed the NHLBI BioData CatalystⓇ (BDC), a community-driven ecosystem where researchers, including bench and clinical scientists, statisticians, and algorithm developers, find, access, share, store, and compute on large-scale datasets. This ecosystem provides secure, cloud-based workspaces, user authentication and authorization, search, tools and workflows, applications, and new innovative features to address community needs, including exploratory data analysis, genomic and imaging tools, tools for reproducibility, and improved interoperability with other NIH data science platforms. BDC offers straightforward access to large-scale datasets and computational resources that support precision medicine for heart, lung, blood, and sleep conditions, leveraging separately developed and managed platforms to maximize flexibility based on researcher needs, expertise, and backgrounds. Through the NHLBI BioData Catalyst Fellows Program, BDC facilitates scientific discoveries and technological advances. BDC also facilitated accelerated research on the coronavirus disease-2019 (COVID-19) pandemic.


Subject(s)
COVID-19 , Cloud Computing , Humans , Ecosystem , Reproducibility of Results , Lung , Software
4.
Health Informatics J ; 29(2): 14604582231170892, 2023.
Article in English | MEDLINE | ID: mdl-37066514

ABSTRACT

The Integrated Clinical and Environmental Exposures Service (ICEES) provides open regulatory-compliant access to clinical data, including electronic health record data, that have been integrated with environmental exposures data. While ICEES has been validated in the context of an asthma use case and several other use cases, the regulatory constraints on the ICEES open application programming interface (OpenAPI) result in data loss when using the service for multivariate analysis. In this study, we investigated the robustness of the ICEES OpenAPI through a comparative analysis, in which we applied a generalized linear model (GLM) to the OpenAPI data and the constraint-free source data to examine factors predictive of asthma exacerbations. Consistent with previous studies, we found that the main predictors identified by both analyses were sex, prednisone, race, obesity, and airborne particulate exposure. Comparison of GLM model fit revealed that data loss impacts model quality, but only with select interaction terms. We conclude that the ICEES OpenAPI supports multivariate analysis, albeit with potential data loss that users should be aware of.


Subject(s)
Asthma , Electronic Health Records , Humans , Linear Models , Environmental Exposure , Software , Asthma/epidemiology
5.
J Am Med Inform Assoc ; 30(3): 447-455, 2023 02 16.
Article in English | MEDLINE | ID: mdl-36451264

ABSTRACT

OBJECTIVE: This article describes the implementation of a privacy-preserving record linkage (PPRL) solution across PCORnet®, the National Patient-Centered Clinical Research Network. MATERIAL AND METHODS: Using a PPRL solution from Datavant, we quantified the degree of patient overlap across the network and report a de-duplicated analysis of the demographic and clinical characteristics of the PCORnet population. RESULTS: There were ∼170M patient records across the responding Network Partners, with ∼138M (81%) of those corresponding to a unique patient. 82.1% of patients were found in a single partner and 14.7% were in 2. The percentage overlap between Partners ranged between 0% and 80% with a median of 0%. Linking patients' electronic health records with claims increased disease prevalence in every clinical characteristic, ranging between 63% and 173%. DISCUSSION: The overlap between Partners was variable and depended on timeframe. However, patient data linkage changed the prevalence profile of the PCORnet patient population. CONCLUSIONS: This project was one of the largest linkage efforts of its kind and demonstrates the potential value of record linkage. Linkage between Partners may be most useful in cases where there is geographic proximity between Partners, an expectation that potential linkage Partners will be able to fill gaps in data, or a longer study timeframe.


Subject(s)
Confidentiality , Privacy , Humans , Medical Record Linkage , Computer Security , Electronic Health Records , Patient-Centered Care , Demography
6.
BMC Res Notes ; 15(1): 337, 2022 Oct 31.
Article in English | MEDLINE | ID: mdl-36316778

ABSTRACT

OBJECTIVE: The aim of this study was to determine whether a secure, privacy-preserving record linkage (PPRL) methodology can be implemented in a scalable manner for use in a large national clinical research network. RESULTS: We established the governance and technical capacity to support the use of PPRL across the National Patient-Centered Clinical Research Network (PCORnet®). As a pilot, four sites used the Datavant software to transform patient personally identifiable information (PII) into de-identified tokens. We queried the sites for patients with a clinical encounter in 2018 or 2019 and matched their tokens to determine whether overlap existed. We described patient overlap among the sites and generated a "deduplicated" table of patient demographic characteristics. Overlapping patients were found in 3 of the 6 site-pairs. Following deduplication, the total patient count was 3,108,515 (0.11% reduction), with the largest reduction in count for patients with an "Other/Missing" value for Sex; from 198 to 163 (17.6% reduction). The PPRL solution successfully links patients across data sources using distributed queries without directly accessing patient PII. The overlap queries and analysis performed in this pilot is being replicated across the full network to provide additional insight into patient linkages among a distributed research network.


Subject(s)
Electronic Health Records , Privacy , Humans , Medical Record Linkage/methods , Databases, Factual , Patient-Centered Care
7.
Front Artif Intell ; 5: 918888, 2022.
Article in English | MEDLINE | ID: mdl-35837616

ABSTRACT

Research on rare diseases has received increasing attention, in part due to the realized profitability of orphan drugs. Biomedical informatics holds promise in accelerating translational research on rare disease, yet challenges remain, including the lack of diagnostic codes for rare diseases and privacy concerns that prevent research access to electronic health records when few patients exist. The Integrated Clinical and Environmental Exposures Service (ICEES) provides regulatory-compliant open access to electronic health record data that have been integrated with environmental exposures data, as well as analytic tools to explore the integrated data. We describe a proof-of-concept application of ICEES to examine demographics, clinical characteristics, environmental exposures, and health outcomes among a cohort of patients enriched for phenotypes associated with cystic fibrosis (CF), idiopathic bronchiectasis (IB), and primary ciliary dyskinesia (PCD). We then focus on a subset of patients with CF, leveraging the availability of a diagnostic code for CF and serving as a benchmark for our development work. We use ICEES to examine select demographics, co-diagnoses, and environmental exposures that may contribute to poor health outcomes among patients with CF, defined as emergency department or inpatient visits for respiratory issues. We replicate current understanding of the pathogenesis and clinical manifestations of CF by identifying co-diagnoses of asthma, chronic nasal congestion, cough, middle ear disease, and pneumonia as factors that differentiate patients with poor health outcomes from those with better health outcomes. We conclude by discussing our preliminary findings in relation to other published work, the strengths and limitations of our approach, and our future directions.

8.
Bioinformatics ; 38(12): 3252-3258, 2022 06 13.
Article in English | MEDLINE | ID: mdl-35441678

ABSTRACT

MOTIVATION: As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. RESULTS: Developed through the National Heart, Lung and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug's total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results. AVAILABILITY AND IMPLEMENTATION: Dug is freely available at https://github.com/helxplatform/dug. An example Dug deployment is also available for use at https://search.biodatacatalyst.renci.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Search Engine , Semantics , Ecosystem , Abstracting and Indexing
9.
IEEE J Biomed Health Inform ; 26(2): 572-580, 2022 02.
Article in English | MEDLINE | ID: mdl-34288883

ABSTRACT

This paper proposes a novel deep learning architecture involving combinations of Convolutional Neural Networks (CNN) layers and Recurrent neural networks (RNN) layers that can be used to perform segmentation and classification of 5 cardiac rhythms based on ECG recordings. The algorithm is developed in a sequence to sequence setting where the input is a sequence of five second ECG signal sliding windows and the output is a sequence of cardiac rhythm labels. The novel architecture processes as input both the spectrograms of the ECG signal as well as the heartbeats' signal waveform. Additionally, we are able to train the model in the presence of label noise. The model's performance and generalizability is verified on an external database different from the one we used to train. Experimental result shows this approach can achieve an average F1 scores of 0.89 (averaged across 5 classes). The proposed model also achieves comparable classification performance to existing state-of-the-art approach with considerably less number of training parameters.


Subject(s)
Arrhythmias, Cardiac , Electrocardiography , Algorithms , Arrhythmias, Cardiac/diagnostic imaging , Heart Rate , Humans , Neural Networks, Computer
10.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 1920-1932, 2022.
Article in English | MEDLINE | ID: mdl-34133284

ABSTRACT

Image-based cell counting is a fundamental yet challenging task with wide applications in biological research. In this paper, we propose a novel unified deep network framework designed to solve this problem for various cell types in both 2D and 3D images. Specifically, we first propose SAU-Net for cell counting by extending the segmentation network U-Net with a Self-Attention module. Second, we design an extension of Batch Normalization (BN) to facilitate the training process for small datasets. In addition, a new 3D benchmark dataset based on the existing mouse blastocyst (MBC) dataset is developed and released to the community. Our SAU-Net achieves state-of-the-art results on four benchmark 2D datasets - synthetic fluorescence microscopy (VGG) dataset, Modified Bone Marrow (MBM) dataset, human subcutaneous adipose tissue (ADI) dataset, and Dublin Cell Counting (DCC) dataset, and the new 3D dataset, MBC. The BN extension is validated using extensive experiments on the 2D datasets, since GPU memory constraints preclude use of 3D datasets. The source code is available at https://github.com/mzlr/sau-net.


Subject(s)
Imaging, Three-Dimensional , Microscopy , Animals , Attention , Humans , Image Processing, Computer-Assisted/methods , Mice
12.
Article in English | MEDLINE | ID: mdl-34769911

ABSTRACT

ICEES (Integrated Clinical and Environmental Exposures Service) provides a disease-agnostic, regulatory-compliant approach for openly exposing and analyzing clinical data that have been integrated at the patient level with environmental exposures data. ICEES is equipped with basic features to support exploratory analysis using statistical approaches, such as bivariate chi-square tests. We recently developed a method for using ICEES to generate multivariate tables for subsequent application of machine learning and statistical models. The objective of the present study was to use this approach to identify predictors of asthma exacerbations through the application of three multivariate methods: conditional random forest, conditional tree, and generalized linear model. Among seven potential predictor variables, we found five to be of significant importance using both conditional random forest and conditional tree: prednisone, race, airborne particulate exposure, obesity, and sex. The conditional tree method additionally identified several significant two-way and three-way interactions among the same variables. When we applied a generalized linear model, we identified four significant predictor variables, namely prednisone, race, airborne particulate exposure, and obesity. When ranked in order by effect size, the results were in agreement with the results from the conditional random forest and conditional tree methods as well as the published literature. Our results suggest that the open multivariate analytic capabilities provided by ICEES are valid in the context of an asthma use case and likely will have broad value in advancing open research in environmental and public health.


Subject(s)
Asthma , Environmental Exposure , Asthma/epidemiology , Asthma/etiology , Humans , Machine Learning , Models, Statistical
13.
Cell Rep ; 37(2): 109802, 2021 10 12.
Article in English | MEDLINE | ID: mdl-34644582

ABSTRACT

Tissue-clearing methods allow every cell in the mouse brain to be imaged without physical sectioning. However, the computational tools currently available for cell quantification in cleared tissue images have been limited to counting sparse cell populations in stereotypical mice. Here, we introduce NuMorph, a group of analysis tools to quantify all nuclei and nuclear markers within the mouse cortex after clearing and imaging by light-sheet microscopy. We apply NuMorph to investigate two distinct mouse models: a Topoisomerase 1 (Top1) model with severe neurodegenerative deficits and a Neurofibromin 1 (Nf1) model with a more subtle brain overgrowth phenotype. In each case, we identify differential effects of gene deletion on individual cell-type counts and distribution across cortical regions that manifest as alterations of gross brain morphology. These results underline the value of whole-brain imaging approaches, and the tools are widely applicable for studying brain structure phenotypes at cellular resolution.


Subject(s)
Cell Nucleus/pathology , Cerebral Cortex/pathology , Histocytological Preparation Techniques , Nerve Degeneration , Neuroglia/pathology , Neuroimaging , Neurons/pathology , Animals , Cell Nucleus/metabolism , Cerebral Cortex/metabolism , DNA Topoisomerases, Type I/deficiency , DNA Topoisomerases, Type I/genetics , Gene Deletion , Genes, Neurofibromatosis 1 , Image Processing, Computer-Assisted , Mice, Knockout , Neuroglia/metabolism , Neurons/metabolism , Phenotype , Support Vector Machine
14.
JAMIA Open ; 4(3): ooaa069, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34514351

ABSTRACT

OBJECTIVES: Social determinants of health (SDH), key contributors to health, are rarely systematically measured and collected in the electronic health record (EHR). We investigate how to leverage clinical notes using novel applications of multi-label learning (MLL) to classify SDH in mental health and substance use disorder patients who frequent the emergency department. METHODS AND MATERIALS: We labeled a gold-standard corpus of EHR clinical note sentences (N = 4063) with 6 identified SDH-related domains recommended by the Institute of Medicine for inclusion in the EHR. We then trained 5 classification models: linear-Support Vector Machine, K-Nearest Neighbors, Random Forest, XGBoost, and bidirectional Long Short-Term Memory (BI-LSTM). We adopted 5 common evaluation measures: accuracy, average precision-recall (AP), area under the curve receiver operating characteristic (AUC-ROC), Hamming loss, and log loss to compare the performance of different methods for MLL classification using the F1 score as the primary evaluation metric. RESULTS: Our results suggested that, overall, BI-LSTM outperformed the other classification models in terms of AUC-ROC (93.9), AP (0.76), and Hamming loss (0.12). The AUC-ROC values of MLL models of SDH related domains varied between (0.59-1.0). We found that 44.6% of our study population (N = 1119) had at least one positive documentation of SDH. DISCUSSION AND CONCLUSION: The proposed approach of training an MLL model on an SDH rich data source can produce a high performing classifier using only unstructured clinical notes. We also provide evidence that model performance is associated with lexical diversity by health professionals and the auto-generation of clinical note sentences to document SDH.

15.
J Med Internet Res ; 23(10): e31400, 2021 10 11.
Article in English | MEDLINE | ID: mdl-34533459

ABSTRACT

BACKGROUND: Many countries have experienced 2 predominant waves of COVID-19-related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. OBJECTIVE: In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. METHODS: Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. RESULTS: Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. CONCLUSIONS: Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve.


Subject(s)
COVID-19 , Pandemics , Adult , Aged , Female , Hospitalization , Hospitals , Humans , Male , Middle Aged , Retrospective Studies , SARS-CoV-2
16.
JMIR Public Health Surveill ; 7(9): e29310, 2021 09 02.
Article in English | MEDLINE | ID: mdl-34298500

ABSTRACT

BACKGROUND: As the world faced the pandemic caused by the novel coronavirus disease 2019 (COVID-19), medical professionals, technologists, community leaders, and policy makers sought to understand how best to leverage data for public health surveillance and community education. With this complex public health problem, North Carolinians relied on data from state, federal, and global health organizations to increase their understanding of the pandemic and guide decision-making. OBJECTIVE: We aimed to describe the role that stakeholders involved in COVID-19-related data played in managing the pandemic in North Carolina. The study investigated the processes used by organizations throughout the state in using, collecting, and reporting COVID-19 data. METHODS: We used an exploratory qualitative study design to investigate North Carolina's COVID-19 data collection efforts. To better understand these processes, key informant interviews were conducted with employees from organizations that collected COVID-19 data across the state. We developed an interview guide, and open-ended semistructured interviews were conducted during the period from June through November 2020. Interviews lasted between 30 and 45 minutes and were conducted by data scientists by videoconference. Data were subsequently analyzed using qualitative data analysis software. RESULTS: Results indicated that electronic health records were primary sources of COVID-19 data. Often, data were also used to create dashboards to inform the public or other health professionals, to aid in decision-making, or for reporting purposes. Cross-sector collaboration was cited as a major success. Consistency among metrics and data definitions, data collection processes, and contact tracing were cited as challenges. CONCLUSIONS: Findings suggest that, during future outbreaks, organizations across regions could benefit from data centralization and data governance. Data should be publicly accessible and in a user-friendly format. Additionally, established cross-sector collaboration networks are demonstrably beneficial for public health professionals across the state as these established relationships facilitate a rapid response to evolving public health challenges.


Subject(s)
COVID-19/epidemiology , Data Analysis , Data Collection , Pandemics/prevention & control , Stakeholder Participation/psychology , Female , Health Education , Humans , Male , North Carolina/epidemiology , Public Health Surveillance , Qualitative Research
17.
Article in English | MEDLINE | ID: mdl-35875189

ABSTRACT

The Integrated Clinical and Environmental Exposures Service (ICEES) provides regulatory-compliant open access to sensitive patient data that have been integrated with public exposures data. ICEES was designed initially to support dynamic cohort creation and bivariate contingency tests. The objective of the present study was to develop an open approach to support multivariate analyses using existing ICEES functionalities and abiding by all regulatory constraints. We first developed an open approach for generating a multivariate table that maintains contingencies between clinical and environmental variables using programmatic calls to the open ICEES application programming interface. We then applied the approach to data on a large cohort (N = 22,365) of patients with asthma or related conditions and generated an eight-feature table. Due to regulatory constraints, data loss was incurred with the incorporation of each successive feature variable, from a starting sample size of N = 22,365 to a final sample size of N = 4,556 (20.4%), but data loss was < 10% until the addition of the final two feature variables. We then applied a generalized linear model to the subsequent dataset and focused on the impact of seven select feature variables on asthma exacerbations, defined as annual emergency department or inpatient visits for respiratory issues. We identified five feature variables-sex, race, obesity, prednisone, and airborne particulate exposure-as significant predictors of asthma exacerbations. We discuss the advantages and disadvantages of ICEES open multivariate analysis and conclude that, despite limitations, ICEES can provide a valuable resource for open multivariate analysis and can serve as an exemplar for regulatory-compliant informatic solutions to open patient data, with capabilities to explore the impact of environmental exposures on health outcomes.

18.
JMIR Med Inform ; 8(1): e16042, 2020 Jan 24.
Article in English | MEDLINE | ID: mdl-32012059

ABSTRACT

Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient's medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning-based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning-based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.

19.
JMIR Med Inform ; 7(4): e15199, 2019 Oct 16.
Article in English | MEDLINE | ID: mdl-31621639

ABSTRACT

BACKGROUND: In a multisite clinical research collaboration, institutions may or may not use the same common data model (CDM) to store clinical data. To overcome this challenge, we proposed to use Health Level 7's Fast Healthcare Interoperability Resources (FHIR) as a meta-CDM-a single standard to represent clinical data. OBJECTIVE: In this study, we aimed to create an open-source application termed the Clinical Asset Mapping Program for FHIR (CAMP FHIR) to efficiently transform clinical data to FHIR for supporting source-agnostic CDM-to-FHIR mapping. METHODS: Mapping with CAMP FHIR involves (1) mapping each source variable to its corresponding FHIR element and (2) mapping each item in the source data's value sets to the corresponding FHIR value set item for variables with strict value sets. To date, CAMP FHIR has been used to transform 108 variables from the Informatics for Integrating Biology & the Bedside (i2b2) and Patient-Centered Outcomes Research Network data models to fields across 7 FHIR resources. It is designed to allow input from any source data model and will support additional FHIR resources in the future. RESULTS: We have used CAMP FHIR to transform data on approximately 23,000 patients with asthma from our institution's i2b2 database. Data quality and integrity were validated against the origin point of the data, our enterprise clinical data warehouse. CONCLUSIONS: We believe that CAMP FHIR can serve as an alternative to implementing new CDMs on a project-by-project basis. Moreover, the use of FHIR as a CDM could support rare data sharing opportunities, such as collaborations between academic medical centers and community hospitals. We anticipate adoption and use of CAMP FHIR to foster sharing of clinical data across institutions for downstream applications in translational research.

20.
J Am Med Inform Assoc ; 26(10): 1064-1073, 2019 10 01.
Article in English | MEDLINE | ID: mdl-31077269

ABSTRACT

OBJECTIVE: This study aimed to develop a novel, regulatory-compliant approach for openly exposing integrated clinical and environmental exposures data: the Integrated Clinical and Environmental Exposures Service (ICEES). MATERIALS AND METHODS: The driving clinical use case for research and development of ICEES was asthma, which is a common disease influenced by hundreds of genes and a plethora of environmental exposures, including exposures to airborne pollutants. We developed a pipeline for integrating clinical data on patients with asthma-like conditions with data on environmental exposures derived from multiple public data sources. The data were integrated at the patient and visit level and used to create de-identified, binned, "integrated feature tables," which were then placed behind an OpenAPI. RESULTS: Our preliminary evaluation results demonstrate a relationship between exposure to high levels of particulate matter ≤2.5 µm in diameter (PM2.5) and the frequency of emergency department or inpatient visits for respiratory issues. For example, 16.73% of patients with average daily exposure to PM2.5 >9.62 µg/m3 experienced 2 or more emergency department or inpatient visits for respiratory issues in year 2010 compared with 7.93% of patients with lower exposures (n = 23 093). DISCUSSION: The results validated our overall approach for openly exposing and sharing integrated clinical and environmental exposures data. We plan to iteratively refine and expand ICEES by including additional years of data, feature variables, and disease cohorts. CONCLUSIONS: We believe that ICEES will serve as a regulatory-compliant model and approach for promoting open access to and sharing of integrated clinical and environmental exposures data.


Subject(s)
Asthma , Datasets as Topic , Environmental Exposure , Information Dissemination , Intersectoral Collaboration , Translational Research, Biomedical , Access to Information , Censuses , Computational Biology , Female , Government Regulation , Humans , Male , Particulate Matter , United States , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...