RESUMEN
When developing models for clinical information retrieval and decision support systems, the discrete outcomes required for training are often missing. These labels need to be extracted from free text in electronic health records. For this extraction process one of the most important contextual properties in clinical text is negation, which indicates the absence of findings. We aimed to improve large scale extraction of labels by comparing three methods for negation detection in Dutch clinical notes. We used the Erasmus Medical Center Dutch Clinical Corpus to compare a rule-based method based on ContextD, a biLSTM model using MedCAT and (finetuned) RoBERTa-based models. We found that both the biLSTM and RoBERTa models consistently outperform the rule-based model in terms of F1 score, precision and recall. In addition, we systematically categorized the classification errors for each model, which can be used to further improve model performance in particular applications. Combining the three models naively was not beneficial in terms of performance. We conclude that the biLSTM and RoBERTa-based models in particular are highly accurate accurate in detecting clinical negations, but that ultimately all three approaches can be viable depending on the use case at hand.
Asunto(s)
Registros Electrónicos de Salud , Aprendizaje Automático , Almacenamiento y Recuperación de la Información , Procesamiento de Lenguaje NaturalRESUMEN
Endovascular thrombectomy (EVT) success to treat acute ischemic stroke varies with factors like stroke etiology and clot composition, which can differ between sexes. We studied if sex-specific blood cell characteristics (BCCs) are related to recanalization success. We analyzed electronic health records of 333 EVT patients from a single intervention center, and extracted 71 BCCs from the Sapphire flow cytometry analyzer. Through Sparse Partial Least Squares Discriminant Analysis, incorporating cross-validation and stability selection, we identified BCCs associated with successful recanalization (TICI 3) in both sexes. Stroke etiology was considered, while controlling for cardiovascular risk factors. Of the patients, successful recanalization was achieved in 51% of women and 49% of men. 21 of the 71 BCCs showed significant differences between sexes (pFDR-corrected < 0.05). The female-focused recanalization model had lower error rates than both combined [t(192.4) = 5.9, p < 0.001] and male-only models [t(182.6) = - 15.6, p < 0.001]. In women, successful recanalization and cardioembolism were associated with a higher number of reticulocytes, while unsuccessful recanalization and large artery atherosclerosis (LAA) as cause of stroke were associated with a higher mean corpuscular hemoglobin concentration. In men, unsuccessful recanalization and LAA as cause of stroke were associated with a higher coefficient of variance of lymphocyte complexity of the intracellular structure. Sex-specific BCCs related to recanalization success varied and were linked to stroke etiology. This enhanced understanding may facilitate personalized treatment for acute ischemic stroke.
Asunto(s)
Aterosclerosis , Isquemia Encefálica , Procedimientos Endovasculares , Accidente Cerebrovascular Isquémico , Accidente Cerebrovascular , Humanos , Masculino , Femenino , Accidente Cerebrovascular Isquémico/cirugía , Accidente Cerebrovascular Isquémico/etiología , Isquemia Encefálica/etiología , Caracteres Sexuales , Resultado del Tratamiento , Estudios Retrospectivos , Trombectomía/efectos adversos , Accidente Cerebrovascular/etiología , Células Sanguíneas , Aterosclerosis/etiologíaRESUMEN
Biological processes underlying decreased cerebral blood flow (CBF) in patients with cardiovascular disease (CVD) are largely unknown. We hypothesized that identification of protein clusters associated with lower CBF in patients with CVD may explain underlying processes. In 428 participants (74% cardiovascular diseases; 26% reference participants) from the Heart-Brain Connection Study, we assessed the relationship between 92 plasma proteins from the Olink® cardiovascular III panel and normal-appearing grey matter CBF, using affinity propagation and hierarchical clustering algorithms, and generated a Biomarker Compound Score (BCS). The BCS was related to cardiovascular risk and observed cardiovascular events within 2-year follow-up using Spearman correlation and logistic regression. Thirteen proteins were associated with CBF (ρSpearman range: -0.10 to -0.19, pFDR-corrected <0.05), and formed one cluster. The cluster primarily reflected extracellular matrix organization processes. The BCS was higher in patients with CVD compared to reference participants (pFDR-corrected <0.05) and was associated with cardiovascular risk (ρSpearman 0.42, p < 0.001) and cardiovascular events (OR 2.05, p < 0.01). In conclusion, we identified a cluster of plasma proteins related to CBF, reflecting extracellular matrix organization processes, that is also related to future cardiovascular events in patients with CVD, representing potential targets to preserve CBF and mitigate cardiovascular risk in patients with CVD.
Asunto(s)
Enfermedades Cardiovasculares , Humanos , Encéfalo , Proteínas Sanguíneas , Biomarcadores , Circulación Cerebrovascular/fisiologíaRESUMEN
Aims: With the ageing European population, the incidence of coronary artery disease (CAD) is expected to rise. This will likely result in an increased imaging use. Symptom recognition can be complicated, as symptoms caused by CAD can be atypical, particularly in women. Early CAD exclusion may help to optimize use of diagnostic resources and thus improve the sustainability of the healthcare system. To develop sex-stratified algorithms, trained on routinely available electronic health records (EHRs), raw electrocardiograms, and haematology data to exclude CAD in patients upfront. Methods and results: We trained XGBoost algorithms on data from patients from the Utrecht Patient-Oriented Database, who underwent coronary computed tomography angiography (CCTA), and/or stress cardiac magnetic resonance (CMR) imaging, or stress single-photon emission computerized tomography (SPECT) in the UMC Utrecht. Outcomes were extracted from radiology reports. We aimed to maximize negative predictive value (NPV) to minimize the false negative risk with acceptable specificity. Of 6808 CCTA patients (31% female), 1029 females (48%) and 1908 males (45%) had no diagnosis of CAD. Of 3053 CMR/SPECT patients (45% female), 650 females (47%) and 881 males (48%) had no diagnosis of CAD. On the train and test set, the CCTA models achieved NPVs and specificities of 0.95 and 0.19 (females) and 0.96 and 0.09 (males). The CMR/SPECT models achieved NPVs and specificities of 0.75 and 0.041 (females) and 0.92 and 0.026 (males). Conclusion: Coronary artery disease can be excluded from EHRs with high NPV. Our study demonstrates new possibilities to reduce unnecessary imaging in women and men suspected of CAD.
RESUMEN
Biological processes underlying cerebral small vessel disease (cSVD) are largely unknown. We hypothesized that identification of clusters of inter-related bood-based biomarkers that are associated with the burden of cSVD provides leads on underlying biological processes. In 494 participants (mean age 67.6 ± 8.7 years; 36% female; 75% cardiovascular diseases; 25% reference participants) we assessed the relation between 92 blood-based biomarkers from the OLINK cardiovascular III panel and cSVD, using cluster-based analyses. We focused particularly on white matter hyperintensities (WMH). Nineteen biomarkers individually correlated with WMH ratio (r range: 0.16-0.27, Bonferroni corrected p-values <0.05), of which sixteen biomarkers formed one biomarker cluster. Pathway analysis showed that this biomarker cluster predominantly reflected coagulation processes. This cluster related also significantly to other cSVD manifestations (lacunar infarcts, microbleeds, and enlarged perivascular spaces), which supports generalizability beyond WMHs. To study possible causal effects of biological processes reflected by the cluster we performed a mediation analysis that showed a mediation effect of the cluster on the relation between age and WMH ratio (proportion mediated 17%), and hypertension and WMH-volume (proportion mediated 21%). In conclusion, we identified a cluster of blood-based biomarkers reflecting coagulation, that is related to manifestations of cSVD, corroborating involvement of coagulation abnormalities in the etiology of cSVD.
Asunto(s)
Enfermedades de los Pequeños Vasos Cerebrales , Accidente Vascular Cerebral Lacunar , Anciano , Biomarcadores , Enfermedades de los Pequeños Vasos Cerebrales/complicaciones , Femenino , Humanos , Imagen por Resonancia Magnética , Masculino , Persona de Mediana Edad , Factores de RiesgoRESUMEN
BACKGROUND: The new concept of difficult-to-treat rheumatoid arthritis (D2T RA) refers to RA patients who remain symptomatic after several lines of treatment, resulting in a high patient and economic burden. During a hackathon, we aimed to identify and predict D2T RA patients in structured and unstructured routine care data. METHODS: Routine care data of 1873 RA patients were extracted from the Utrecht Patient Oriented Database. Data from a previous cross-sectional study, in which 152 RA patients were clinically classified as either D2T or non-D2T, served as a validation set. Machine learning techniques, text mining, and feature importance analyses were performed to identify and predict D2T RA patients based on structured and unstructured routine care data. RESULTS: We identified 123 potentially new D2T RA patients by applying the D2T RA definition in structured and unstructured routine care data. Additionally, we developed a D2T RA identification model derived from a feature importance analysis of all available structured data (AUC-ROC 0.88 (95% CI 0.82-0.94)), and we demonstrated the potential of longitudinal hematological data to differentiate D2T from non-D2T RA patients using supervised dimension reduction. Lastly, using data up to the time of starting the first biological treatment, we predicted future development of D2TRA (AUC-ROC 0.73 (95% CI 0.71-0.75)). CONCLUSIONS: During this hackathon, we have demonstrated the potential of different techniques for the identification and prediction of D2T RA patients in structured as well as unstructured routine care data. The results are promising and should be optimized and validated in future research.
Asunto(s)
Artritis Reumatoide , Artritis Reumatoide/diagnóstico , Artritis Reumatoide/tratamiento farmacológico , Bases de Datos Factuales , Humanos , Aprendizaje AutomáticoRESUMEN
INTRODUCTION: Non-small-cell lung cancer exhibits a range of transcriptional and epigenetic patterns that not only define distinct phenotypes, but may also govern immune related genes, which have a major impact on survival. METHODS: We used open-source RNA expression and DNA methylation data of the Cancer Genome Atlas with matched non-cancerous tissue to evaluate whether these pretreatment molecular patterns also influenced genes related to the immune system and overall survival. RESULTS: The distinction between lung adenocarcinoma and squamous cell carcinoma are determined by 1083 conserved methylation loci and RNA expression of 203 genes which differ for >80 % of patients between the two subtypes. Using the RNA expression profiles of 6 genes, more than 95 % of patients could be correctly classified as having either adeno or squamous cell lung cancer. Comparing tumor tissue with matched normal tissue, no differences in RNA expression were found for costimulatory and co-inhibitory genes, nor genes involved in cytokine release. However, genes involved in antigen presentation had a lower expression and a wider distribution in tumor tissue. DISCUSSION: Only a small number of genes, influenced by DNA methylation, determine the lung cancer subtype. The antigen presentation of cancer cells is dysfunctional, while other T cell immune functions appear to remain intact.