Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
1.
N C Med J ; 85(4): 256-259, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-39466095

RESUMEN

As a biomedical data scientist, when I think of the future of artificial intelligence in health care, the potential fills me with both excitement and caution. A promising area of innovation, AI can be used to assess the impact of social determinants of health on health outcomes, though more standardization is needed.


Asunto(s)
Inteligencia Artificial , Determinantes Sociales de la Salud , Humanos , Atención a la Salud
2.
Res Sq ; 2024 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-38883709

RESUMEN

Accurate identification of acute coronary syndrome (ACS) in the prehospital sestting is important for timely treatments that reduce damage to the compromised myocardium. Current machine learning approaches lack sufficient performance to safely rule-in or rule-out ACS. Our goal is to identify a method that bridges this gap. To do so, we retrospectively evaluate two promising approaches, an ensemble of gradient boosted decision trees (GBDT) and selective classification (SC) on consecutive patients transported by ambulance to the ED with chest pain and/or anginal equivalents. On the task of ACS classification with 23 prehospital covariates, we found the fusion of the two (GBDT+SC) improves the best reported sensitivity and specificity by 8% and 23% respectively. Accordingly, GBDT+SC is safer than current machine learning approaches to rule-in and rule-out of ACS in the prehospital setting.

3.
Front Public Health ; 12: 1347862, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38737862

RESUMEN

The COVID-19 pandemic has necessitated the development of robust tools for tracking and modeling the spread of the virus. We present 'K-Track-Covid,' an interactive web-based dashboard developed using the R Shiny framework, to offer users an intuitive dashboard for analyzing the geographical and temporal spread of COVID-19 in South Korea. Our dashboard employs dynamic user interface elements, employs validated epidemiological models, and integrates regional data to offer tailored visual displays. The dashboard allows users to customize their data views by selecting specific time frames, geographic regions, and demographic groups. This customization enables the generation of charts and statistical summaries pertinent to both daily fluctuations and cumulative counts of COVID-19 cases, as well as mortality statistics. Additionally, the dashboard offers a simulation model based on mathematical models, enabling users to make predictions under various parameter settings. The dashboard is designed to assist researchers, policymakers, and the public in understanding the spread and impact of COVID-19, thereby facilitating informed decision-making. All data and resources related to this study are publicly available to ensure transparency and facilitate further research.


Asunto(s)
COVID-19 , Internet , Humanos , República de Corea/epidemiología , COVID-19/epidemiología , SARS-CoV-2 , Interfaz Usuario-Computador , Pandemias , Modelos Epidemiológicos
4.
J Am Med Inform Assoc ; 30(7): 1293-1300, 2023 06 20.
Artículo en Inglés | MEDLINE | ID: mdl-37192819

RESUMEN

Research increasingly relies on interrogating large-scale data resources. The NIH National Heart, Lung, and Blood Institute developed the NHLBI BioData CatalystⓇ (BDC), a community-driven ecosystem where researchers, including bench and clinical scientists, statisticians, and algorithm developers, find, access, share, store, and compute on large-scale datasets. This ecosystem provides secure, cloud-based workspaces, user authentication and authorization, search, tools and workflows, applications, and new innovative features to address community needs, including exploratory data analysis, genomic and imaging tools, tools for reproducibility, and improved interoperability with other NIH data science platforms. BDC offers straightforward access to large-scale datasets and computational resources that support precision medicine for heart, lung, blood, and sleep conditions, leveraging separately developed and managed platforms to maximize flexibility based on researcher needs, expertise, and backgrounds. Through the NHLBI BioData Catalyst Fellows Program, BDC facilitates scientific discoveries and technological advances. BDC also facilitated accelerated research on the coronavirus disease-2019 (COVID-19) pandemic.


Asunto(s)
COVID-19 , Nube Computacional , Humanos , Ecosistema , Reproducibilidad de los Resultados , Pulmón , Programas Informáticos
5.
Health Informatics J ; 29(2): 14604582231170892, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37066514

RESUMEN

The Integrated Clinical and Environmental Exposures Service (ICEES) provides open regulatory-compliant access to clinical data, including electronic health record data, that have been integrated with environmental exposures data. While ICEES has been validated in the context of an asthma use case and several other use cases, the regulatory constraints on the ICEES open application programming interface (OpenAPI) result in data loss when using the service for multivariate analysis. In this study, we investigated the robustness of the ICEES OpenAPI through a comparative analysis, in which we applied a generalized linear model (GLM) to the OpenAPI data and the constraint-free source data to examine factors predictive of asthma exacerbations. Consistent with previous studies, we found that the main predictors identified by both analyses were sex, prednisone, race, obesity, and airborne particulate exposure. Comparison of GLM model fit revealed that data loss impacts model quality, but only with select interaction terms. We conclude that the ICEES OpenAPI supports multivariate analysis, albeit with potential data loss that users should be aware of.


Asunto(s)
Asma , Registros Electrónicos de Salud , Humanos , Modelos Lineales , Exposición a Riesgos Ambientales , Programas Informáticos , Asma/epidemiología
6.
J Am Med Inform Assoc ; 30(3): 447-455, 2023 02 16.
Artículo en Inglés | MEDLINE | ID: mdl-36451264

RESUMEN

OBJECTIVE: This article describes the implementation of a privacy-preserving record linkage (PPRL) solution across PCORnet®, the National Patient-Centered Clinical Research Network. MATERIAL AND METHODS: Using a PPRL solution from Datavant, we quantified the degree of patient overlap across the network and report a de-duplicated analysis of the demographic and clinical characteristics of the PCORnet population. RESULTS: There were ∼170M patient records across the responding Network Partners, with ∼138M (81%) of those corresponding to a unique patient. 82.1% of patients were found in a single partner and 14.7% were in 2. The percentage overlap between Partners ranged between 0% and 80% with a median of 0%. Linking patients' electronic health records with claims increased disease prevalence in every clinical characteristic, ranging between 63% and 173%. DISCUSSION: The overlap between Partners was variable and depended on timeframe. However, patient data linkage changed the prevalence profile of the PCORnet patient population. CONCLUSIONS: This project was one of the largest linkage efforts of its kind and demonstrates the potential value of record linkage. Linkage between Partners may be most useful in cases where there is geographic proximity between Partners, an expectation that potential linkage Partners will be able to fill gaps in data, or a longer study timeframe.


Asunto(s)
Confidencialidad , Privacidad , Humanos , Registro Médico Coordinado , Seguridad Computacional , Registros Electrónicos de Salud , Atención Dirigida al Paciente , Demografía
7.
BMC Res Notes ; 15(1): 337, 2022 Oct 31.
Artículo en Inglés | MEDLINE | ID: mdl-36316778

RESUMEN

OBJECTIVE: The aim of this study was to determine whether a secure, privacy-preserving record linkage (PPRL) methodology can be implemented in a scalable manner for use in a large national clinical research network. RESULTS: We established the governance and technical capacity to support the use of PPRL across the National Patient-Centered Clinical Research Network (PCORnet®). As a pilot, four sites used the Datavant software to transform patient personally identifiable information (PII) into de-identified tokens. We queried the sites for patients with a clinical encounter in 2018 or 2019 and matched their tokens to determine whether overlap existed. We described patient overlap among the sites and generated a "deduplicated" table of patient demographic characteristics. Overlapping patients were found in 3 of the 6 site-pairs. Following deduplication, the total patient count was 3,108,515 (0.11% reduction), with the largest reduction in count for patients with an "Other/Missing" value for Sex; from 198 to 163 (17.6% reduction). The PPRL solution successfully links patients across data sources using distributed queries without directly accessing patient PII. The overlap queries and analysis performed in this pilot is being replicated across the full network to provide additional insight into patient linkages among a distributed research network.


Asunto(s)
Registros Electrónicos de Salud , Privacidad , Humanos , Registro Médico Coordinado/métodos , Bases de Datos Factuales , Atención Dirigida al Paciente
8.
Front Artif Intell ; 5: 918888, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35837616

RESUMEN

Research on rare diseases has received increasing attention, in part due to the realized profitability of orphan drugs. Biomedical informatics holds promise in accelerating translational research on rare disease, yet challenges remain, including the lack of diagnostic codes for rare diseases and privacy concerns that prevent research access to electronic health records when few patients exist. The Integrated Clinical and Environmental Exposures Service (ICEES) provides regulatory-compliant open access to electronic health record data that have been integrated with environmental exposures data, as well as analytic tools to explore the integrated data. We describe a proof-of-concept application of ICEES to examine demographics, clinical characteristics, environmental exposures, and health outcomes among a cohort of patients enriched for phenotypes associated with cystic fibrosis (CF), idiopathic bronchiectasis (IB), and primary ciliary dyskinesia (PCD). We then focus on a subset of patients with CF, leveraging the availability of a diagnostic code for CF and serving as a benchmark for our development work. We use ICEES to examine select demographics, co-diagnoses, and environmental exposures that may contribute to poor health outcomes among patients with CF, defined as emergency department or inpatient visits for respiratory issues. We replicate current understanding of the pathogenesis and clinical manifestations of CF by identifying co-diagnoses of asthma, chronic nasal congestion, cough, middle ear disease, and pneumonia as factors that differentiate patients with poor health outcomes from those with better health outcomes. We conclude by discussing our preliminary findings in relation to other published work, the strengths and limitations of our approach, and our future directions.

9.
Bioinformatics ; 38(12): 3252-3258, 2022 06 13.
Artículo en Inglés | MEDLINE | ID: mdl-35441678

RESUMEN

MOTIVATION: As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. RESULTS: Developed through the National Heart, Lung and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug's total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results. AVAILABILITY AND IMPLEMENTATION: Dug is freely available at https://github.com/helxplatform/dug. An example Dug deployment is also available for use at https://search.biodatacatalyst.renci.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Motor de Búsqueda , Semántica , Ecosistema , Indización y Redacción de Resúmenes
10.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 1920-1932, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34133284

RESUMEN

Image-based cell counting is a fundamental yet challenging task with wide applications in biological research. In this paper, we propose a novel unified deep network framework designed to solve this problem for various cell types in both 2D and 3D images. Specifically, we first propose SAU-Net for cell counting by extending the segmentation network U-Net with a Self-Attention module. Second, we design an extension of Batch Normalization (BN) to facilitate the training process for small datasets. In addition, a new 3D benchmark dataset based on the existing mouse blastocyst (MBC) dataset is developed and released to the community. Our SAU-Net achieves state-of-the-art results on four benchmark 2D datasets - synthetic fluorescence microscopy (VGG) dataset, Modified Bone Marrow (MBM) dataset, human subcutaneous adipose tissue (ADI) dataset, and Dublin Cell Counting (DCC) dataset, and the new 3D dataset, MBC. The BN extension is validated using extensive experiments on the 2D datasets, since GPU memory constraints preclude use of 3D datasets. The source code is available at https://github.com/mzlr/sau-net.


Asunto(s)
Imagenología Tridimensional , Microscopía , Animales , Atención , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Ratones
11.
IEEE J Biomed Health Inform ; 26(2): 572-580, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34288883

RESUMEN

This paper proposes a novel deep learning architecture involving combinations of Convolutional Neural Networks (CNN) layers and Recurrent neural networks (RNN) layers that can be used to perform segmentation and classification of 5 cardiac rhythms based on ECG recordings. The algorithm is developed in a sequence to sequence setting where the input is a sequence of five second ECG signal sliding windows and the output is a sequence of cardiac rhythm labels. The novel architecture processes as input both the spectrograms of the ECG signal as well as the heartbeats' signal waveform. Additionally, we are able to train the model in the presence of label noise. The model's performance and generalizability is verified on an external database different from the one we used to train. Experimental result shows this approach can achieve an average F1 scores of 0.89 (averaged across 5 classes). The proposed model also achieves comparable classification performance to existing state-of-the-art approach with considerably less number of training parameters.


Asunto(s)
Arritmias Cardíacas , Electrocardiografía , Algoritmos , Arritmias Cardíacas/diagnóstico por imagen , Frecuencia Cardíaca , Humanos , Redes Neurales de la Computación
13.
Artículo en Inglés | MEDLINE | ID: mdl-34769911

RESUMEN

ICEES (Integrated Clinical and Environmental Exposures Service) provides a disease-agnostic, regulatory-compliant approach for openly exposing and analyzing clinical data that have been integrated at the patient level with environmental exposures data. ICEES is equipped with basic features to support exploratory analysis using statistical approaches, such as bivariate chi-square tests. We recently developed a method for using ICEES to generate multivariate tables for subsequent application of machine learning and statistical models. The objective of the present study was to use this approach to identify predictors of asthma exacerbations through the application of three multivariate methods: conditional random forest, conditional tree, and generalized linear model. Among seven potential predictor variables, we found five to be of significant importance using both conditional random forest and conditional tree: prednisone, race, airborne particulate exposure, obesity, and sex. The conditional tree method additionally identified several significant two-way and three-way interactions among the same variables. When we applied a generalized linear model, we identified four significant predictor variables, namely prednisone, race, airborne particulate exposure, and obesity. When ranked in order by effect size, the results were in agreement with the results from the conditional random forest and conditional tree methods as well as the published literature. Our results suggest that the open multivariate analytic capabilities provided by ICEES are valid in the context of an asthma use case and likely will have broad value in advancing open research in environmental and public health.


Asunto(s)
Asma , Exposición a Riesgos Ambientales , Asma/epidemiología , Asma/etiología , Humanos , Aprendizaje Automático , Modelos Estadísticos
14.
Cell Rep ; 37(2): 109802, 2021 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-34644582

RESUMEN

Tissue-clearing methods allow every cell in the mouse brain to be imaged without physical sectioning. However, the computational tools currently available for cell quantification in cleared tissue images have been limited to counting sparse cell populations in stereotypical mice. Here, we introduce NuMorph, a group of analysis tools to quantify all nuclei and nuclear markers within the mouse cortex after clearing and imaging by light-sheet microscopy. We apply NuMorph to investigate two distinct mouse models: a Topoisomerase 1 (Top1) model with severe neurodegenerative deficits and a Neurofibromin 1 (Nf1) model with a more subtle brain overgrowth phenotype. In each case, we identify differential effects of gene deletion on individual cell-type counts and distribution across cortical regions that manifest as alterations of gross brain morphology. These results underline the value of whole-brain imaging approaches, and the tools are widely applicable for studying brain structure phenotypes at cellular resolution.


Asunto(s)
Núcleo Celular/patología , Corteza Cerebral/patología , Técnicas de Preparación Histocitológica , Degeneración Nerviosa , Neuroglía/patología , Neuroimagen , Neuronas/patología , Animales , Núcleo Celular/metabolismo , Corteza Cerebral/metabolismo , ADN-Topoisomerasas de Tipo I/deficiencia , ADN-Topoisomerasas de Tipo I/genética , Eliminación de Gen , Genes de Neurofibromatosis 1 , Procesamiento de Imagen Asistido por Computador , Ratones Noqueados , Neuroglía/metabolismo , Neuronas/metabolismo , Fenotipo , Máquina de Vectores de Soporte
15.
JAMIA Open ; 4(3): ooaa069, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-34514351

RESUMEN

OBJECTIVES: Social determinants of health (SDH), key contributors to health, are rarely systematically measured and collected in the electronic health record (EHR). We investigate how to leverage clinical notes using novel applications of multi-label learning (MLL) to classify SDH in mental health and substance use disorder patients who frequent the emergency department. METHODS AND MATERIALS: We labeled a gold-standard corpus of EHR clinical note sentences (N = 4063) with 6 identified SDH-related domains recommended by the Institute of Medicine for inclusion in the EHR. We then trained 5 classification models: linear-Support Vector Machine, K-Nearest Neighbors, Random Forest, XGBoost, and bidirectional Long Short-Term Memory (BI-LSTM). We adopted 5 common evaluation measures: accuracy, average precision-recall (AP), area under the curve receiver operating characteristic (AUC-ROC), Hamming loss, and log loss to compare the performance of different methods for MLL classification using the F1 score as the primary evaluation metric. RESULTS: Our results suggested that, overall, BI-LSTM outperformed the other classification models in terms of AUC-ROC (93.9), AP (0.76), and Hamming loss (0.12). The AUC-ROC values of MLL models of SDH related domains varied between (0.59-1.0). We found that 44.6% of our study population (N = 1119) had at least one positive documentation of SDH. DISCUSSION AND CONCLUSION: The proposed approach of training an MLL model on an SDH rich data source can produce a high performing classifier using only unstructured clinical notes. We also provide evidence that model performance is associated with lexical diversity by health professionals and the auto-generation of clinical note sentences to document SDH.

16.
J Med Internet Res ; 23(10): e31400, 2021 10 11.
Artículo en Inglés | MEDLINE | ID: mdl-34533459

RESUMEN

BACKGROUND: Many countries have experienced 2 predominant waves of COVID-19-related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. OBJECTIVE: In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. METHODS: Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. RESULTS: Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. CONCLUSIONS: Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve.


Asunto(s)
COVID-19 , Pandemias , Adulto , Anciano , Femenino , Hospitalización , Hospitales , Humanos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , SARS-CoV-2
17.
JMIR Public Health Surveill ; 7(9): e29310, 2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-34298500

RESUMEN

BACKGROUND: As the world faced the pandemic caused by the novel coronavirus disease 2019 (COVID-19), medical professionals, technologists, community leaders, and policy makers sought to understand how best to leverage data for public health surveillance and community education. With this complex public health problem, North Carolinians relied on data from state, federal, and global health organizations to increase their understanding of the pandemic and guide decision-making. OBJECTIVE: We aimed to describe the role that stakeholders involved in COVID-19-related data played in managing the pandemic in North Carolina. The study investigated the processes used by organizations throughout the state in using, collecting, and reporting COVID-19 data. METHODS: We used an exploratory qualitative study design to investigate North Carolina's COVID-19 data collection efforts. To better understand these processes, key informant interviews were conducted with employees from organizations that collected COVID-19 data across the state. We developed an interview guide, and open-ended semistructured interviews were conducted during the period from June through November 2020. Interviews lasted between 30 and 45 minutes and were conducted by data scientists by videoconference. Data were subsequently analyzed using qualitative data analysis software. RESULTS: Results indicated that electronic health records were primary sources of COVID-19 data. Often, data were also used to create dashboards to inform the public or other health professionals, to aid in decision-making, or for reporting purposes. Cross-sector collaboration was cited as a major success. Consistency among metrics and data definitions, data collection processes, and contact tracing were cited as challenges. CONCLUSIONS: Findings suggest that, during future outbreaks, organizations across regions could benefit from data centralization and data governance. Data should be publicly accessible and in a user-friendly format. Additionally, established cross-sector collaboration networks are demonstrably beneficial for public health professionals across the state as these established relationships facilitate a rapid response to evolving public health challenges.


Asunto(s)
COVID-19/epidemiología , Análisis de Datos , Recolección de Datos , Pandemias/prevención & control , Participación de los Interesados/psicología , Femenino , Educación en Salud , Humanos , Masculino , North Carolina/epidemiología , Vigilancia en Salud Pública , Investigación Cualitativa
18.
Artículo en Inglés | MEDLINE | ID: mdl-35875189

RESUMEN

The Integrated Clinical and Environmental Exposures Service (ICEES) provides regulatory-compliant open access to sensitive patient data that have been integrated with public exposures data. ICEES was designed initially to support dynamic cohort creation and bivariate contingency tests. The objective of the present study was to develop an open approach to support multivariate analyses using existing ICEES functionalities and abiding by all regulatory constraints. We first developed an open approach for generating a multivariate table that maintains contingencies between clinical and environmental variables using programmatic calls to the open ICEES application programming interface. We then applied the approach to data on a large cohort (N = 22,365) of patients with asthma or related conditions and generated an eight-feature table. Due to regulatory constraints, data loss was incurred with the incorporation of each successive feature variable, from a starting sample size of N = 22,365 to a final sample size of N = 4,556 (20.4%), but data loss was < 10% until the addition of the final two feature variables. We then applied a generalized linear model to the subsequent dataset and focused on the impact of seven select feature variables on asthma exacerbations, defined as annual emergency department or inpatient visits for respiratory issues. We identified five feature variables-sex, race, obesity, prednisone, and airborne particulate exposure-as significant predictors of asthma exacerbations. We discuss the advantages and disadvantages of ICEES open multivariate analysis and conclude that, despite limitations, ICEES can provide a valuable resource for open multivariate analysis and can serve as an exemplar for regulatory-compliant informatic solutions to open patient data, with capabilities to explore the impact of environmental exposures on health outcomes.

19.
JMIR Med Inform ; 8(1): e16042, 2020 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-32012059

RESUMEN

Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient's medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning-based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning-based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.

20.
JMIR Med Inform ; 7(4): e15199, 2019 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-31621639

RESUMEN

BACKGROUND: In a multisite clinical research collaboration, institutions may or may not use the same common data model (CDM) to store clinical data. To overcome this challenge, we proposed to use Health Level 7's Fast Healthcare Interoperability Resources (FHIR) as a meta-CDM-a single standard to represent clinical data. OBJECTIVE: In this study, we aimed to create an open-source application termed the Clinical Asset Mapping Program for FHIR (CAMP FHIR) to efficiently transform clinical data to FHIR for supporting source-agnostic CDM-to-FHIR mapping. METHODS: Mapping with CAMP FHIR involves (1) mapping each source variable to its corresponding FHIR element and (2) mapping each item in the source data's value sets to the corresponding FHIR value set item for variables with strict value sets. To date, CAMP FHIR has been used to transform 108 variables from the Informatics for Integrating Biology & the Bedside (i2b2) and Patient-Centered Outcomes Research Network data models to fields across 7 FHIR resources. It is designed to allow input from any source data model and will support additional FHIR resources in the future. RESULTS: We have used CAMP FHIR to transform data on approximately 23,000 patients with asthma from our institution's i2b2 database. Data quality and integrity were validated against the origin point of the data, our enterprise clinical data warehouse. CONCLUSIONS: We believe that CAMP FHIR can serve as an alternative to implementing new CDMs on a project-by-project basis. Moreover, the use of FHIR as a CDM could support rare data sharing opportunities, such as collaborations between academic medical centers and community hospitals. We anticipate adoption and use of CAMP FHIR to foster sharing of clinical data across institutions for downstream applications in translational research.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...