Search | VHL Regional Portal

Electronic Medical Record-Based Machine Learning Approach to Predict the Risk of 30-Day Adverse Cardiac Events After Invasive Coronary Treatment: Machine Learning Model Development and Validation.

Kwon, Osung; Na, Wonjun; Kang, Heejun; Jun, Tae Joon; Kweon, Jihoon; Park, Gyung-Min; Cho, YongHyun; Hur, Cinyoung; Chae, Jungwoo; Kang, Do-Yoon; Lee, Pil Hyung; Ahn, Jung-Min; Park, Duk-Woo; Kang, Soo-Jin; Lee, Seung-Whan; Lee, Cheol Whan; Park, Seong-Wook; Park, Seung-Jung; Yang, Dong Hyun; Kim, Young-Hak.

JMIR Med Inform ; 10(5): e26801, 2022 May 11.

Article in English | MEDLINE | ID: mdl-35544292

ABSTRACT

BACKGROUND: Although there is a growing interest in prediction models based on electronic medical records (EMRs) to identify patients at risk of adverse cardiac events following invasive coronary treatment, robust models fully utilizing EMR data are limited. OBJECTIVE: We aimed to develop and validate machine learning (ML) models by using diverse fields of EMR to predict the risk of 30-day adverse cardiac events after percutaneous intervention or bypass surgery. METHODS: EMR data of 5,184,565 records of 16,793 patients at a quaternary hospital between 2006 and 2016 were categorized into static basic (eg, demographics), dynamic time-series (eg, laboratory values), and cardiac-specific data (eg, coronary angiography). The data were randomly split into training, tuning, and testing sets in a ratio of 3:1:1. Each model was evaluated with 5-fold cross-validation and with an external EMR-based cohort at a tertiary hospital. Logistic regression (LR), random forest (RF), gradient boosting machine (GBM), and feedforward neural network (FNN) algorithms were applied. The primary outcome was 30-day mortality following invasive treatment. RESULTS: GBM showed the best performance with area under the receiver operating characteristic curve (AUROC) of 0.99; RF had a similar AUROC of 0.98. AUROCs of FNN and LR were 0.96 and 0.93, respectively. GBM had the highest area under the precision-recall curve (AUPRC) of 0.80, and the AUPRCs of RF, LR, and FNN were 0.73, 0.68, and 0.63, respectively. All models showed low Brier scores of <0.1 as well as highly fitted calibration plots, indicating a good fit of the ML-based models. On external validation, the GBM model demonstrated maximal performance with an AUROC of 0.90, while FNN had an AUROC of 0.85. The AUROCs of LR and RF were slightly lower at 0.80 and 0.79, respectively. The AUPRCs of GBM, LR, and FNN were similar at 0.47, 0.43, and 0.41, respectively, while that of RF was lower at 0.33. Among the categories in the GBM model, time-series dynamic data demonstrated a high AUROC of >0.95, contributing majorly to the excellent results. CONCLUSIONS: Exploiting the diverse fields of the EMR data set, the ML-based 30-day adverse cardiac event prediction models demonstrated outstanding results, and the applied framework could be generalized for various health care prediction models.

Facilitating the Development of Deep Learning Models with Visual Analytics for Electronic Health Records.

Hur, Cinyoung; Wi, JeongA; Kim, YoungBin.

Int J Environ Res Public Health ; 17(22)2020 11 10.

Article in English | MEDLINE | ID: mdl-33182703

ABSTRACT

Electronic health record (EHR) data are widely used to perform early diagnoses and create treatment plans, which are key areas of research. We aimed to increase the efficiency of iteratively applying data-intensive technology and verifying the results for complex and big EHR data. We used a system entailing sequence mining, interpretable deep learning models, and visualization on data extracted from the MIMIC-IIIdatabase for a group of patients diagnosed with heart disease. The results of sequence mining corresponded to specific pathways of interest to medical staff and were used to select patient groups that underwent these pathways. An interactive Sankey diagram representing these pathways and a heat map visually representing the weight of each variable were developed for temporal and quantitative illustration. We applied the proposed system to predict unplanned cardiac surgery using clinical pathways determined by sequence pattern mining to select cardiac surgery from complex EHRs to label subject groups and deep learning models. The proposed system aids in the selection of pathway-based patient groups, simplification of labeling, and exploratory the interpretation of the modeling results. The proposed system can help medical staff explore various pathways that patients have undergone and further facilitate the testing of various clinical hypotheses using big data in the medical domain.

Subject(s)

Computer Simulation , Deep Learning , Electronic Health Records , Big Data , Humans

Identifying key hospital service quality factors in online health communities.

Jung, Yuchul; Hur, Cinyoung; Jung, Dain; Kim, Minki.

J Med Internet Res ; 17(4): e90, 2015 Apr 07.

Article in English | MEDLINE | ID: mdl-25855612

ABSTRACT

BACKGROUND: The volume of health-related user-created content, especially hospital-related questions and answers in online health communities, has rapidly increased. Patients and caregivers participate in online community activities to share their experiences, exchange information, and ask about recommended or discredited hospitals. However, there is little research on how to identify hospital service quality automatically from the online communities. In the past, in-depth analysis of hospitals has used random sampling surveys. However, such surveys are becoming impractical owing to the rapidly increasing volume of online data and the diverse analysis requirements of related stakeholders. OBJECTIVE: As a solution for utilizing large-scale health-related information, we propose a novel approach to identify hospital service quality factors and overtime trends automatically from online health communities, especially hospital-related questions and answers. METHODS: We defined social media-based key quality factors for hospitals. In addition, we developed text mining techniques to detect such factors that frequently occur in online health communities. After detecting these factors that represent qualitative aspects of hospitals, we applied a sentiment analysis to recognize the types of recommendations in messages posted within online health communities. Korea's two biggest online portals were used to test the effectiveness of detection of social media-based key quality factors for hospitals. RESULTS: To evaluate the proposed text mining techniques, we performed manual evaluations on the extraction and classification results, such as hospital name, service quality factors, and recommendation types using a random sample of messages (ie, 5.44% (9450/173,748) of the total messages). Service quality factor detection and hospital name extraction achieved average F1 scores of 91% and 78%, respectively. In terms of recommendation classification, performance (ie, precision) is 78% on average. Extraction and classification performance still has room for improvement, but the extraction results are applicable to more detailed analysis. Further analysis of the extracted information reveals that there are differences in the details of social media-based key quality factors for hospitals according to the regions in Korea, and the patterns of change seem to accurately reflect social events (eg, influenza epidemics). CONCLUSIONS: These findings could be used to provide timely information to caregivers, hospital officials, and medical officials for health care policies.

Subject(s)

Hospitals/standards , Quality of Health Care , Social Media , Caregivers , Humans , Internet , Republic of Korea

Investigating the congruence of crowdsourced information with official government data: the case of pediatric clinics.

Kim, Minki; Jung, Yuchul; Jung, Dain; Hur, Cinyoung.

J Med Internet Res ; 16(2): e29, 2014 Feb 03.

Article in English | MEDLINE | ID: mdl-24496094

ABSTRACT

BACKGROUND: Health 2.0 is a benefit to society by helping patients acquire knowledge about health care by harnessing collective intelligence. However, any misleading information can directly affect patients' choices of hospitals and drugs, and potentially exacerbate their health condition. OBJECTIVE: This study investigates the congruence between crowdsourced information and official government data in the health care domain and identifies the determinants of low congruence where it exists. In-line with infodemiology, we suggest measures to help the patients in the regions vulnerable to inaccurate health information. METHODS: We text-mined multiple online health communities in South Korea to construct the data for crowdsourced information on public health services (173,748 messages). Kendall tau and Spearman rank order correlation coefficients were used to compute the differences in 2 ranking systems of health care quality: actual government evaluations of 779 hospitals and mining results of geospecific online health communities. Then we estimated the effect of sociodemographic characteristics on the level of congruence by using an ordinary least squares regression. RESULTS: The regression results indicated that the standard deviation of married women's education (P=.046), population density (P=.01), number of doctors per pediatric clinic (P=.048), and birthrate (P=.002) have a significant effect on the congruence of crowdsourced data (adjusted R²=.33). Specifically, (1) the higher the birthrate in a given region, (2) the larger the variance in educational attainment, (3) the higher the population density, and (4) the greater the number of doctors per clinic, the more likely that crowdsourced information from online communities is congruent with official government data. CONCLUSIONS: To investigate the cause of the spread of misleading health information in the online world, we adopted a unique approach by associating mining results on hospitals from geospecific online health communities with the sociodemographic characteristics of corresponding regions. We found that the congruence of crowdsourced information on health care services varied across regions and that these variations could be explained by geospecific demographic factors. This finding can be helpful to governments in reducing the potential risk of misleading online information and the accompanying safety issues.

Subject(s)

Child Health Services/standards , Crowdsourcing , Hospitals, Pediatric/standards , Pediatrics/standards , Anti-Bacterial Agents/therapeutic use , Child , Data Mining , Delivery of Health Care , Federal Government , Hospitals, Urban/standards , Humans , Least-Squares Analysis , Online Systems , Republic of Korea , Socioeconomic Factors , Unnecessary Procedures/statistics & numerical data

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL