Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 88
Filter
Add more filters

Country/Region as subject
Publication year range
1.
J Biomed Inform ; 154: 104647, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38692465

ABSTRACT

OBJECTIVE: To use software, datasets, and data formats in the domain of Infectious Disease Epidemiology as a test collection to evaluate a novel M1 use case, which we introduce in this paper. M1 is a machine that upon receipt of a new digital object of research exhaustively finds all valid compositions of it with existing objects. METHOD: We implemented a data-format-matching-only M1 using exhaustive search, which we refer to as M1DFM. We then ran M1DFM on the test collection and used error analysis to identify needed semantic constraints. RESULTS: Precision of M1DFM search was 61.7%. Error analysis identified needed semantic constraints and needed changes in handling of data services. Most semantic constraints were simple, but one data format was sufficiently complex to be practically impossible to represent semantic constraints over, from which we conclude limitatively that software developers will have to meet the machines halfway by engineering software whose inputs are sufficiently simple that their semantic constraints can be represented, akin to the simple APIs of services. We summarize these insights as M1-FAIR guiding principles for composability and suggest a roadmap for progressively capable devices in the service of reuse and accelerated scientific discovery. CONCLUSION: Algorithmic search of digital repositories for valid workflow compositions has potential to accelerate scientific discovery but requires a scalable solution to the problem of knowledge acquisition about semantic constraints on software inputs. Additionally, practical limitations on the logical complexity of semantic constraints must be respected, which has implications for the design of software.


Subject(s)
Software , Humans , Semantics , Machine Learning , Algorithms , Databases, Factual
2.
J Biomed Inform ; 153: 104642, 2024 May.
Article in English | MEDLINE | ID: mdl-38621641

ABSTRACT

OBJECTIVE: To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio. METHODS: We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups. RESULTS: We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (∼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups. CONCLUSIONS: Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.


Subject(s)
Narration , Natural Language Processing , Social Determinants of Health , Humans , Female , Male , Bias , Electronic Health Records , Documentation/methods , Data Mining/methods
3.
Ann Surg ; 277(2): 179-185, 2023 02 01.
Article in English | MEDLINE | ID: mdl-35797553

ABSTRACT

OBJECTIVE: We test the hypothesis that for low-acuity surgical patients, postoperative intensive care unit (ICU) admission is associated with lower value of care compared with ward admission. BACKGROUND: Overtriaging low-acuity patients to ICU consumes valuable resources and may not confer better patient outcomes. Associations among postoperative overtriage, patient outcomes, costs, and value of care have not been previously reported. METHODS: In this longitudinal cohort study, postoperative ICU admissions were classified as overtriaged or appropriately triaged according to machine learning-based patient acuity assessments and requirements for immediate postoperative mechanical ventilation or vasopressor support. The nearest neighbors algorithm identified risk-matched control ward admissions. The primary outcome was value of care, calculated as inverse observed-to-expected mortality ratios divided by total costs. RESULTS: Acuity assessments had an area under the receiver operating characteristic curve of 0.92 in generating predictions for triage classifications. Of 8592 postoperative ICU admissions, 423 (4.9%) were overtriaged. These were matched with 2155 control ward admissions with similar comorbidities, incidence of emergent surgery, immediate postoperative vital signs, and do not resuscitate order placement and rescindment patterns. Compared with controls, overtraiged admissions did not have a lower incidence of any measured complications. Total costs for admission were $16.4K for overtriage and $15.9K for controls ( P =0.03). Value of care was lower for overtriaged admissions [2.9 (2.0-4.0)] compared with controls [24.2 (14.1-34.5), P <0.001]. CONCLUSIONS: Low-acuity postoperative patients who were overtriaged to ICUs had increased total costs, no improvements in outcomes, and received low-value care.


Subject(s)
Hospitalization , Intensive Care Units , Humans , Longitudinal Studies , Retrospective Studies , Cohort Studies
4.
Med Care ; 61(12 Suppl 2): S153-S160, 2023 12 01.
Article in English | MEDLINE | ID: mdl-37963035

ABSTRACT

PCORnet, the National Patient-Centered Clinical Research Network, provides the ability to conduct prospective and observational pragmatic research by leveraging standardized, curated electronic health records data together with patient and stakeholder engagement. PCORnet is funded by the Patient-Centered Outcomes Research Institute (PCORI) and is composed of 8 Clinical Research Networks that incorporate at total of 79 health system "sites." As the network developed, linkage to commercial health plans, federal insurance claims, disease registries, and other data resources demonstrated the value in extending the networks infrastructure to provide a more complete representation of patient's health and lived experiences. Initially, PCORnet studies avoided direct economic comparative effectiveness as a topic. However, PCORI's authorizing law was amended in 2019 to allow studies to incorporate patient-centered economic outcomes in primary research aims. With PCORI's expanded scope and PCORnet's phase 3 beginning in January 2022, there are opportunities to strengthen the network's ability to support economic patient-centered outcomes research. This commentary will discuss approaches that have been incorporated to date by the network and point to opportunities for the network to incorporate economic variables for analysis, informed by patient and stakeholder perspectives. Topics addressed include: (1) data linkage infrastructure; (2) commercial health plan partnerships; (3) Medicare and Medicaid linkage; (4) health system billing-based benchmarking; (5) area-level measures; (6) individual-level measures; (7) pharmacy benefits and retail pharmacy data; and (8) the importance of transparency and engagement while addressing the biases inherent in linking real-world data sources.


Subject(s)
Medicare , Patient Outcome Assessment , Aged , Humans , United States , Prospective Studies , Outcome Assessment, Health Care , Patient-Centered Care
5.
Metabolomics ; 19(2): 11, 2023 02 06.
Article in English | MEDLINE | ID: mdl-36745241

ABSTRACT

BACKGROUND: Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS). AIM OF REVIEW: This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software. KEY SCIENTIFIC CONCEPTS OF REVIEW: We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.


Subject(s)
Metabolomics , Software , Metabolomics/methods , Chromatography, Liquid/methods , Mass Spectrometry/methods , Data Management
6.
Ann Surg ; 275(2): 332-339, 2022 02 01.
Article in English | MEDLINE | ID: mdl-34261886

ABSTRACT

OBJECTIVE: Develop unifying definitions and paradigms for data-driven methods to augment postoperative resource intensity decisions. SUMMARY BACKGROUND DATA: Postoperative level-of-care assignments and frequency of vital sign and laboratory measurements (ie, resource intensity) should align with patient acuity. Effective, data-driven decision-support platforms could improve value of care for millions of patients annually, but their development is hindered by the lack of salient definitions and paradigms. METHODS: Embase, PubMed, and Web of Science were searched for articles describing patient acuity and resource intensity after inpatient surgery. Study quality was assessed using validated tools. Thirty-five studies were included and assimilated according to PRISMA guidelines. RESULTS: Perioperative patient acuity is accurately represented by combinations of demographic, physiologic, and hospital-system variables as input features in models that capture complex, non-linear relationships. Intraoperative physiologic data enriche these representations. Triaging high-acuity patients to low-intensity care is associated with increased risk for mortality; triaging low-acuity patients to intensive care units (ICUs) has low value and imparts harm when other, valid requests for ICU admission are denied due to resource limitations, increasing their risk for unrecognized decompensation and failure-to-rescue. Providing high-intensity care for low-acuity patients may also confer harm through unnecessary testing and subsequent treatment of incidental findings, but there is insufficient evidence to evaluate this hypothesis. Compared with data-driven models, clinicians exhibit volatile performance in predicting complications and making postoperative resource intensity decisions. CONCLUSION: To optimize value, postoperative resource intensity decisions should align with precise, data-driven patient acuity assessments augmented by models that accurately represent complex, non-linear relationships among risk factors.


Subject(s)
Health Resources , Patient Acuity , Surgical Procedures, Operative , Humans , Postoperative Period
7.
Environ Res ; 197: 111185, 2021 06.
Article in English | MEDLINE | ID: mdl-33901445

ABSTRACT

An individual's health and conditions are associated with a complex interplay between the individual's genetics and his or her exposures to both internal and external environments. Much attention has been placed on characterizing of the genome in the past; nevertheless, genetics only account for about 10% of an individual's health conditions, while the remaining appears to be determined by environmental factors and gene-environment interactions. To comprehensively understand the causes of diseases and prevent them, environmental exposures, especially the external exposome, need to be systematically explored. However, the heterogeneity of the external exposome data sources (e.g., same exposure variables using different nomenclature in different data sources, or vice versa, two variables have the same or similar name but measure different exposures in reality) increases the difficulty of analyzing and understanding the associations between environmental exposures and health outcomes. To solve the issue, the development of semantic standards using an ontology-driven approach is inevitable because ontologies can (1) provide a unambiguous and consistent understanding of the variables in heterogeneous data sources, and (2) explicitly express and model the context of the variables and relationships between those variables. We conducted a review of existing ontology for the external exposome and found only four relevant ontologies. Further, the four existing ontologies are limited: they (1) often ignored the spatiotemporal characteristics of external exposome data, and (2) were developed in isolation from other conceptual frameworks (e.g., the socioecological model and the social determinants of health). Moving forward, the combination of multi-domain and multi-scale data (i.e., genome, phenome and exposome at different granularity) and different conceptual frameworks is the basis of health outcomes research in the future.


Subject(s)
Exposome , Causality , Environmental Exposure , Female , Humans , Male , Semantics
8.
Pharmacoepidemiol Drug Saf ; 29(11): 1393-1401, 2020 11.
Article in English | MEDLINE | ID: mdl-32844549

ABSTRACT

PURPOSE: Computable phenotypes are constructed to utilize data within the electronic health record (EHR) to identify patients with specific characteristics; a necessary step for researching a complex disease state. We developed computable phenotypes for resistant hypertension (RHTN) and stable controlled hypertension (HTN) based on the National Patient-Centered Clinical Research Network (PCORnet) common data model (CDM). The computable phenotypes were validated through manual chart review. METHODS: We adapted and refined existing computable phenotype algorithms for RHTN and stable controlled HTN to the PCORnet CDM in an adult HTN population from the OneFlorida Clinical Research Consortium (2015-2017). Two independent reviewers validated the computable phenotypes through manual chart review of 425 patient records. We assessed precision of our computable phenotypes through positive predictive value (PPV) and test validity through interrater reliability (IRR). RESULTS: Among the 156 730 HTN patients in our final dataset, the final computable phenotype algorithms identified 24 926 patients with RHTN and 19 100 with stable controlled HTN. The PPV for RHTN in patients randomly selected for validation of the final algorithm was 99.1% (n = 113, CI: 95.2%-99.9%). The PPV for stable controlled HTN in patients randomly selected for validation of the final algorithm was 96.5% (n = 113, CI: 91.2%-99.0%). IRR analysis revealed a raw percent agreement of 91% (152/167) with Cohen's kappa statistic = 0.87. CONCLUSIONS: We constructed and validated a RHTN computable phenotype algorithm and a stable controlled HTN computable phenotype algorithm. Both algorithms are based on the PCORnet CDM, allowing for future application to epidemiological and drug utilization based research.


Subject(s)
Drug Resistance , Electronic Health Records , Hypertension , Adult , Algorithms , Female , Humans , Hypertension/diagnosis , Hypertension/drug therapy , Hypertension/epidemiology , Phenotype , Reproducibility of Results
9.
BMC Med Inform Decis Mak ; 20(1): 258, 2020 10 08.
Article in English | MEDLINE | ID: mdl-33032576

ABSTRACT

BACKGROUND: The symbiotic interactions that occur between humans and organisms in our environment have a tremendous impact on our health. Recently, there has been a surge in interest in understanding the complex relationships between the microbiome and human health and host immunity against microbial pathogens, among other things. To collect and manage data about these interactions and their complexity, scientists will need ontologies that represent symbiotic interactions as they occur in reality. METHODS: We began with two papers that reviewed the usage of 'symbiosis' and related terms in the biology and ecology literature and prominent textbooks. We then analyzed several prominent standard terminologies and ontologies that contain representations of symbiotic interactions, to determine if they appropriately defined 'symbiosis' and related terms according to current scientific usage as identified by the review papers. In the process, we identified several subtypes of symbiotic interactions, as well as the characteristics that differentiate them, which we used to propose textual and axiomatic definitions for each subtype of interaction. To both illustrate how to use the ontological representations and definitions we created and provide additional quality assurance on key definitions, we carried out a referent tracking analysis and representation of three scenarios involving symbiotic interactions among organisms. RESULTS: We found one definition of 'symbiosis' in an existing ontology that was consistent with the vast preponderance of scientific usage in biology and ecology. However, that ontology changed its definition during the course of our work, and discussions are ongoing. We present a new definition that we have proposed. We also define 34 subtypes of symbiosis. Our referent tracking analysis showed that it is necessary to define symbiotic interactions at the level of the individual, rather than at the species level, due to the complex nature in which organisms can go from participating in one type of symbiosis with one organism to participating in another type of symbiosis with a different organism. CONCLUSION: As a result of our efforts here, we have developed a robust representation of symbiotic interactions using a realism-based approach, which fills a gap in existing biomedical ontologies.


Subject(s)
Biological Ontologies , Symbiosis , Humans
10.
BMC Bioinformatics ; 20(Suppl 21): 708, 2019 Dec 23.
Article in English | MEDLINE | ID: mdl-31865907

ABSTRACT

BACKGROUND: The Drug Ontology (DrOn) is a modular, extensible ontology of drug products, their ingredients, and their biological activity created to enable comparative effectiveness and health services researchers to query National Drug Codes (NDCs) that represent products by ingredient, by molecular disposition, by therapeutic disposition, and by physiological effect (e.g., diuretic). It is based on the RxNorm drug terminology maintained by the U.S. National Library of Medicine, and on the Chemical Entities of Biological Interest ontology. Both national drug codes (NDCs) and RxNorm unique concept identifiers (RXCUIS) can undergo changes over time that can obfuscate their meaning when these identifiers occur in historic data. We present a new approach to modeling these entities within DrOn that will allow users of DrOn working with historic prescription data to more easily and correctly interpret that data. RESULTS: We have implemented a full accounting of national drug codes and RxNorm unique concept identifiers as information content entities, and of the processes involved in managing their creation and changes. This includes an OWL file that implements and defines the classes necessary to model these entities. A separate file contains an instance-level prototype in OWL that demonstrates the feasibility of this approach to representing NDCs and RXCUIs and the processes of managing them by retrieving and representing several individual NDCs, both active and inactive, and the RXCUIs to which they are connected. We also demonstrate how historic information about these identifiers in DrOn can be easily retrieved using a simple SPARQL query. CONCLUSIONS: An accurate model of how these identifiers operate in reality is a valuable addition to DrOn that enhances its usefulness as a knowledge management resource for working with historic data.


Subject(s)
Vocabulary, Controlled , Biological Ontologies , National Library of Medicine (U.S.) , RxNorm , Semantics , United States
11.
Ann Surg ; 269(4): 652-662, 2019 04.
Article in English | MEDLINE | ID: mdl-29489489

ABSTRACT

OBJECTIVE: To accurately calculate the risk for postoperative complications and death after surgery in the preoperative period using machine-learning modeling of clinical data. BACKGROUND: Postoperative complications cause a 2-fold increase in the 30-day mortality and cost, and are associated with long-term consequences. The ability to precisely forecast the risk for major complications before surgery is limited. METHODS: In a single-center cohort of 51,457 surgical patients undergoing major inpatient surgery, we have developed and validated an automated analytics framework for a preoperative risk algorithm (MySurgeryRisk) that uses existing clinical data in electronic health records to forecast patient-level probabilistic risk scores for 8 major postoperative complications (acute kidney injury, sepsis, venous thromboembolism, intensive care unit admission >48 hours, mechanical ventilation >48 hours, wound, neurologic, and cardiovascular complications) and death up to 24 months after surgery. We used the area under the receiver characteristic curve (AUC) and predictiveness curves to evaluate model performance. RESULTS: MySurgeryRisk calculates probabilistic risk scores for 8 postoperative complications with AUC values ranging between 0.82 and 0.94 [99% confidence intervals (CIs) 0.81-0.94]. The model predicts the risk for death at 1, 3, 6, 12, and 24 months with AUC values ranging between 0.77 and 0.83 (99% CI 0.76-0.85). CONCLUSIONS: We constructed an automated predictive analytics framework for machine-learning algorithm with high discriminatory ability for assessing the risk of surgical complications and death using readily available preoperative electronic health records data. The feasibility of this novel algorithm implemented in real time clinical workflow requires further testing.


Subject(s)
Algorithms , Machine Learning , Postoperative Complications/epidemiology , Risk Assessment/methods , Humans , Postoperative Complications/mortality , Preoperative Period
12.
BMC Med Inform Decis Mak ; 19(Suppl 5): 232, 2019 12 05.
Article in English | MEDLINE | ID: mdl-31801524

ABSTRACT

BACKGROUND: De-identification is a critical technology to facilitate the use of unstructured clinical text while protecting patient privacy and confidentiality. The clinical natural language processing (NLP) community has invested great efforts in developing methods and corpora for de-identification of clinical notes. These annotated corpora are valuable resources for developing automated systems to de-identify clinical text at local hospitals. However, existing studies often utilized training and test data collected from the same institution. There are few studies to explore automated de-identification under cross-institute settings. The goal of this study is to examine deep learning-based de-identification methods at a cross-institute setting, identify the bottlenecks, and provide potential solutions. METHODS: We created a de-identification corpus using a total 500 clinical notes from the University of Florida (UF) Health, developed deep learning-based de-identification models using 2014 i2b2/UTHealth corpus, and evaluated the performance using UF corpus. We compared five different word embeddings trained from the general English text, clinical text, and biomedical literature, explored lexical and linguistic features, and compared two strategies to customize the deep learning models using UF notes and resources. RESULTS: Pre-trained word embeddings using a general English corpus achieved better performance than embeddings from de-identified clinical text and biomedical literature. The performance of deep learning models trained using only i2b2 corpus significantly dropped (strict and relax F1 scores dropped from 0.9547 and 0.9646 to 0.8568 and 0.8958) when applied to another corpus annotated at UF Health. Linguistic features could further improve the performance of de-identification in cross-institute settings. After customizing the models using UF notes and resource, the best model achieved the strict and relaxed F1 scores of 0.9288 and 0.9584, respectively. CONCLUSIONS: It is necessary to customize de-identification models using local clinical text and other resources when applied in cross-institute settings. Fine-tuning is a potential solution to re-use pre-trained parameters and reduce the training time to customize deep learning-based de-identification models trained using clinical corpus from a different institution.


Subject(s)
Data Anonymization , Deep Learning , Confidentiality , Electronic Health Records , Humans , Linguistics , Natural Language Processing
13.
J Med Internet Res ; 20(4): e137, 2018 04 12.
Article in English | MEDLINE | ID: mdl-29650502

ABSTRACT

BACKGROUND: Older patients with multiple chronic conditions are often faced with increased health care needs and subsequent higher medical costs, posing significant financial burden to patients, their caregivers, and the health care system. The increasing adoption of electronic health record systems and the proliferation of clinical data offer new opportunities for prevalence studies and for population health assessment. The last few years have witnessed an increasing number of clinical research networks focused on building large collections of clinical data from electronic health records and claims to make it easier and less costly to conduct clinical research. OBJECTIVE: The aim of this study was to compare the prevalence of common chronic conditions and multiple chronic conditions in older adults between Florida and the United States using data from the OneFlorida Clinical Research Consortium and the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS). METHODS: We first analyzed the basic demographic characteristics of the older adults in 3 datasets-the 2013 OneFlorida data, the 2013 HCUP NIS data, and the combined 2012 to 2016 OneFlorida data. Then we analyzed the prevalence of each of the 25 chronic conditions in each of the 3 datasets. We stratified the analysis of older adults with hypertension, the most prevalent condition. Additionally, we examined trends (ie, overall trends and then by age, race, and gender) in the prevalence of discharge records representing multiple chronic conditions over time for the OneFlorida (2012-2016) and HCUP NIS cohorts (2003-2013). RESULTS: The rankings of the top 10 prevalent conditions are the same across the OneFlorida and HCUP NIS datasets. The most prevalent multiple chronic conditions of 2 conditions among the 3 datasets were-hyperlipidemia and hypertension; hypertension and ischemic heart disease; diabetes and hypertension; chronic kidney disease and hypertension; anemia and hypertension; and hyperlipidemia and ischemic heart disease. We observed increasing trends in multiple chronic conditions in both data sources. CONCLUSIONS: The results showed that chronic conditions and multiple chronic conditions are prevalent in older adults across Florida and the United States. Even though slight differences were observed, the similar estimates of prevalence of chronic conditions and multiple chronic conditions across OneFlorida and HCUP NIS suggested that clinical research data networks such as OneFlorida, built from heterogeneous data sources, can provide rich data resources for conducting large-scale secondary data analyses.


Subject(s)
Electronic Health Records/trends , Multiple Chronic Conditions/psychology , Aged , Aged, 80 and over , Female , Florida , Humans , Inpatients , Male , Prevalence , United States
14.
J Biomed Inform ; 66: 42-51, 2017 02.
Article in English | MEDLINE | ID: mdl-28007583

ABSTRACT

BACKGROUND: The last few years have witnessed an increasing number of clinical research networks (CRNs) focused on building large collections of data from electronic health records (EHRs), claims, and patient-reported outcomes (PROs). Many of these CRNs provide a service for the discovery of research cohorts with various health conditions, which is especially useful for rare diseases. Supporting patient privacy can enhance the scalability and efficiency of such processes; however, current practice mainly relies on policy, such as guidelines defined in the Health Insurance Portability and Accountability Act (HIPAA), which are insufficient for CRNs (e.g., HIPAA does not require encryption of data - which can mitigate insider threats). By combining policy with privacy enhancing technologies we can enhance the trustworthiness of CRNs. The goal of this research is to determine if searchable encryption can instill privacy in CRNs without sacrificing their usability. METHODS: We developed a technique, implemented in working software to enable privacy-preserving cohort discovery (PPCD) services in large distributed CRNs based on elliptic curve cryptography (ECC). This technique also incorporates a block indexing strategy to improve the performance (in terms of computational running time) of PPCD. We evaluated the PPCD service with three real cohort definitions: (1) elderly cervical cancer patients who underwent radical hysterectomy, (2) oropharyngeal and tongue cancer patients who underwent robotic transoral surgery, and (3) female breast cancer patients who underwent mastectomy) with varied query complexity. These definitions were tested in an encrypted database of 7.1 million records derived from the publically available Healthcare Cost and Utilization Project (HCUP) Nationwide Inpatient Sample (NIS). We assessed the performance of the PPCD service in terms of (1) accuracy in cohort discovery, (2) computational running time, and (3) privacy afforded to the underlying records during PPCD. RESULTS: The empirical results indicate that the proposed PPCD can execute cohort discovery queries in a reasonable amount of time, with query runtime in the range of 165-262s for the 3 use cases, with zero compromise in accuracy. We further show that the search performance is practical because it supports a highly parallelized design for secure evaluation over encrypted records. Additionally, our security analysis shows that the proposed construction is resilient to standard adversaries. CONCLUSIONS: PPCD services can be designed for clinical research networks. The security construction presented in this work specifically achieves high privacy guarantees by preventing both threats originating from within and beyond the network.


Subject(s)
Computer Security , Electronic Health Records , Health Insurance Portability and Accountability Act , Confidentiality , Female , Humans , United States
15.
Am J Hypertens ; 37(1): 60-68, 2024 Jan 01.
Article in English | MEDLINE | ID: mdl-37712350

ABSTRACT

BACKGROUND: Apparent treatment-resistant hypertension (aTRH) is defined as uncontrolled blood pressure (BP) despite using ≥3 antihypertensive classes or controlled BP while using ≥4 antihypertensive classes. Patients with aTRH have a higher risk for adverse cardiovascular outcomes compared with patients with controlled hypertension (HTN). Although there have been prior reports on the prevalence, characteristics, and predictors of aTRH, these have been broadly derived from smaller datasets, randomized controlled trials, or closed healthcare systems. METHODS: We extracted patients with HTN defined by ICD-9 and ICD-10 codes during 1/1/2015-12/31/2018, from 2 large electronic health record databases: the OneFlorida Data Trust (n = 223,384) and Research Action for Health Network (REACHnet) (n = 175,229). We applied our previously validated aTRH and stable controlled HTN computable phenotype algorithms and performed univariate and multivariate analyses to identify the prevalence, characteristics, and predictors of aTRH in these populations. RESULTS: The prevalence of aTRH among patients with HTN in OneFlorida (16.7%) and REACHnet (11.3%) was similar to prior reports. Both populations had a significantly higher proportion of Black patients with aTRH compared with those with stable controlled HTN. aTRH in both populations shared similar significant predictors, including Black race, diabetes, heart failure, chronic kidney disease, cardiomegaly, and higher body mass index. In both populations, aTRH was significantly associated with similar comorbidities, when compared with stable controlled HTN. CONCLUSIONS: In 2 large, diverse real-world populations, we observed similar comorbidities and predictors of aTRH as prior studies. In the future, these results may be used to improve healthcare professionals' understanding of aTRH predictors and associated comorbidities.


Subject(s)
Antihypertensive Agents , Hypertension , Humans , Antihypertensive Agents/therapeutic use , Antihypertensive Agents/pharmacology , Electronic Health Records , Risk Factors , Hypertension/diagnosis , Hypertension/drug therapy , Hypertension/epidemiology , Blood Pressure , Prevalence
16.
Ann Surg Open ; 5(2): e429, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38911666

ABSTRACT

Objective: To determine whether certain patients are vulnerable to errant triage decisions immediately after major surgery and whether there are unique sociodemographic phenotypes within overtriaged and undertriaged cohorts. Background: In a fair system, overtriage of low-acuity patients to intensive care units (ICUs) and undertriage of high-acuity patients to general wards would affect all sociodemographic subgroups equally. Methods: This multicenter, longitudinal cohort study of hospital admissions immediately after major surgery compared hospital mortality and value of care (risk-adjusted mortality/total costs) across 4 cohorts: overtriage (N = 660), risk-matched overtriage controls admitted to general wards (N = 3077), undertriage (N = 2335), and risk-matched undertriage controls admitted to ICUs (N = 4774). K-means clustering identified sociodemographic phenotypes within overtriage and undertriage cohorts. Results: Compared with controls, overtriaged admissions had a predominance of male patients (56.2% vs 43.1%, P < 0.001) and commercial insurance (6.4% vs 2.5%, P < 0.001); undertriaged admissions had a predominance of Black patients (28.4% vs 24.4%, P < 0.001) and greater socioeconomic deprivation. Overtriage was associated with increased total direct costs [$16.2K ($11.4K-$23.5K) vs $14.1K ($9.1K-$20.7K), P < 0.001] and low value of care; undertriage was associated with increased hospital mortality (1.5% vs 0.7%, P = 0.002) and hospice care (2.2% vs 0.6%, P < 0.001) and low value of care. Unique sociodemographic phenotypes within both overtriage and undertriage cohorts had similar outcomes and value of care, suggesting that triage decisions, rather than patient characteristics, drive outcomes and value of care. Conclusions: Postoperative triage decisions should ensure equality across sociodemographic groups by anchoring triage decisions to objective patient acuity assessments, circumventing cognitive shortcuts and mitigating bias.

17.
Sci Rep ; 14(1): 7831, 2024 04 03.
Article in English | MEDLINE | ID: mdl-38570569

ABSTRACT

The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother's milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.


Subject(s)
Machine Learning , Natural Language Processing , Female , Humans , Infant , Software , Electronic Health Records , Mothers
18.
J Biomed Inform ; 46(1): 40-6, 2013 Feb.
Article in English | MEDLINE | ID: mdl-22981843

ABSTRACT

Recent studies have clearly demonstrated a shift towards collaborative research and team science approaches across a spectrum of disciplines. Such collaborative efforts have also been acknowledged and nurtured by popular extramurally funded programs including the Clinical Translational Science Award (CTSA) conferred by the National Institutes of Health. Since its inception, the number of CTSA awardees has steadily increased to 60 institutes across 30 states. One of the objectives of CTSA is to accelerate translation of research from bench to bedside to community and train a new genre of researchers under the translational research umbrella. Feasibility of such a translation implicitly demands multi-disciplinary collaboration and mentoring. Networks have proven to be convenient abstractions for studying research collaborations. The present study is a part of the CTSA baseline study and investigates existence of possible community-structure in Biomedical Research Grant Collaboration (BRGC) networks across data sets retrieved from the internally developed grants management system, the Automated Research Information Administrator (ARIA) at the University of Arkansas for Medical Sciences (UAMS). Fastgreedy and link-community community-structure detection algorithms were used to investigate the presence of non-overlapping and overlapping community-structure and their variation across years 2006 and 2009. A surrogate testing approach in conjunction with appropriate discriminant statistics, namely: the modularity index and the maximum partition density is proposed to investigate whether the community-structure of the BRGC networks were different from those generated by certain types of random graphs. Non-overlapping as well as overlapping community-structure detection algorithms indicated the presence of community-structure in the BRGC network. Subsequent, surrogate testing revealed that random graph models considered in the present study may not necessarily be appropriate generative mechanisms of the community-structure in the BRGC networks. The discrepancy in the community-structure between the BRGC networks and the random graph surrogates was especially pronounced at 2009 as opposed to 2006 indicating a possible shift towards team-science and formation of non-trivial modular patterns with time. The results also clearly demonstrate presence of inter-departmental and multi-disciplinary collaborations in BRGC networks. While the results are presented on BRGC networks as a part of the CTSA baseline study at UAMS, the proposed methodologies are as such generic with potential to be extended across other CTSA organizations. Understanding the presence of community-structure can supplement more traditional network analysis as they're useful in identifying research teams and their inter-connections as opposed to the role of individual nodes in the network. Such an understanding can be a critical step prior to devising meaningful interventions for promoting team-science, multi-disciplinary collaborations, cross-fertilization of ideas across research teams and identifying suitable mentors. Understanding the temporal evolution of these communities may also be useful in CTSA evaluation.


Subject(s)
Cooperative Behavior , Research Support as Topic
19.
medRxiv ; 2023 May 01.
Article in English | MEDLINE | ID: mdl-37205447

ABSTRACT

Background: Apparent treatment-resistant hypertension (aTRH) is defined as uncontrolled blood pressure (BP) despite using ≥3 antihypertensive classes or controlled BP while using ≥4 antihypertensive classes. Patients with aTRH have a higher risk for adverse cardiovascular outcomes compared to patients with controlled hypertension. Although there have been prior reports on the prevalence, characteristics, and predictors of aTRH, these have been broadly derived from smaller datasets, randomized controlled trials, or closed healthcare systems. Methods: We extracted patients with hypertension defined by ICD 9 and 10 codes during 1/1/2015-12/31/2018, from two large electronic health record databases: the OneFlorida Data Trust (n=223,384) and Research Action for Health Network (REACHnet) (n=175,229). We applied our previously validated aTRH and stable controlled hypertension (HTN) computable phenotype algorithms and performed univariate and multivariate analyses to identify the prevalence, characteristics, and predictors of aTRH in these real-world populations. Results: The prevalence of aTRH in OneFlorida (16.7%) and REACHnet (11.3%) was similar to prior reports. Both populations had a significantly higher proportion of black patients with aTRH compared to those with stable controlled HTN. aTRH in both populations shared similar significant predictors, including black race, diabetes, heart failure, chronic kidney disease, cardiomegaly, and higher body mass index. In both populations, aTRH was significantly associated with similar comorbidities, when compared with stable controlled HTN. Conclusion: In two large, diverse real-world populations, we observed similar comorbidities and predictors of aTRH as prior studies. In the future, these results may be used to improve healthcare professionals' understanding of aTRH predictors and associated comorbidities. Clinical Perspective: What Is New?: Prior studies of apparent treatment resistant hypertension have focused on cohorts from smaller datasets, randomized controlled trials, or closed healthcare systems.We used validated computable phenotype algorithms for apparent treatment resistant hypertension and stable controlled hypertension to identify the prevalence, characteristics, and predictors of apparent treatment resistant hypertension in two large, diverse real-world populations.What Are the Clinical Implications?: Large, diverse real-world populations showed a similar prevalence of aTRH, 16.7% in OneFlorida and 11.3% in REACHnet, compared to those observed from other cohorts.Patients classified as apparent treatment resistant hypertension were significantly older and had a higher prevalence of comorbid conditions such as diabetes, dyslipidemia, coronary artery disease, heart failure with preserved ejection fraction, and chronic kidney disease stages 1-3.Within diverse, real-world populations, the strongest predictors for apparent treatment resistant hypertension were black race, higher body mass index, heart failure, chronic kidney disease, and diabetes.

20.
J Am Med Inform Assoc ; 30(9): 1486-1493, 2023 08 18.
Article in English | MEDLINE | ID: mdl-37316988

ABSTRACT

OBJECTIVE: To develop a natural language processing system that solves both clinical concept extraction and relation extraction in a unified prompt-based machine reading comprehension (MRC) architecture with good generalizability for cross-institution applications. METHODS: We formulate both clinical concept extraction and relation extraction using a unified prompt-based MRC architecture and explore state-of-the-art transformer models. We compare our MRC models with existing deep learning models for concept extraction and end-to-end relation extraction using 2 benchmark datasets developed by the 2018 National NLP Clinical Challenges (n2c2) challenge (medications and adverse drug events) and the 2022 n2c2 challenge (relations of social determinants of health [SDoH]). We also evaluate the transfer learning ability of the proposed MRC models in a cross-institution setting. We perform error analyses and examine how different prompting strategies affect the performance of MRC models. RESULTS AND CONCLUSION: The proposed MRC models achieve state-of-the-art performance for clinical concept and relation extraction on the 2 benchmark datasets, outperforming previous non-MRC transformer models. GatorTron-MRC achieves the best strict and lenient F1-scores for concept extraction, outperforming previous deep learning models on the 2 datasets by 1%-3% and 0.7%-1.3%, respectively. For end-to-end relation extraction, GatorTron-MRC and BERT-MIMIC-MRC achieve the best F1-scores, outperforming previous deep learning models by 0.9%-2.4% and 10%-11%, respectively. For cross-institution evaluation, GatorTron-MRC outperforms traditional GatorTron by 6.4% and 16% for the 2 datasets, respectively. The proposed method is better at handling nested/overlapped concepts, extracting relations, and has good portability for cross-institute applications. Our clinical MRC package is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerMRC.


Subject(s)
Comprehension , Drug-Related Side Effects and Adverse Reactions , Humans , Natural Language Processing
SELECTION OF CITATIONS
SEARCH DETAIL