Pesquisa | BVS IEC

1.

Term-BLAST-like alignment tool for concept recognition in noisy clinical texts.

Groza, Tudor; Wu, Honghan; Dinger, Marcel E; Danis, Daniel; Hilton, Coleman; Bagley, Anita; Davids, Jon R; Luo, Ling; Lu, Zhiyong; Robinson, Peter N.

Bioinformatics ; 39(12)2023 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-38001031

RESUMO

MOTIVATION: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION: Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.

Assuntos

Algoritmos , Idioma , Humanos , Alinhamento de Sequência , Registros Eletrônicos de Saúde , Publicações

2.

Antidepressant and antipsychotic prescribing in patients with type 2 diabetes in Scotland: A time-trend analysis from 2004 to 2021.

Greene, Charlotte R L; Blackbourn, Luke A K; McGurnaghan, Stuart J; Mercer, Stewart W; Smith, Daniel J; Wild, Sarah H; Wu, Honghan; Jackson, Caroline A.

Br J Clin Pharmacol ; 2024 Jul 09.

Artigo em Inglês | MEDLINE | ID: mdl-38981672

RESUMO

AIMS: Prescribing of antidepressant and antipsychotic drugs in general populations has increased in the United Kingdom, but prescribing trends in people with type 2 diabetes (T2D) have not previously been investigated. The aim of this study was to describe time trends in annual prevalence of antidepressant and antipsychotic drug prescribing in adult patients with T2D. METHODS: We conducted repeated annual cross-sectional analysesof a population-based diabetes registry with 99% coverage, derived from primary and secondary care data in Scotland, from 2004 to 2021. For each cross-sectional calendar year time period, we calculated the prevalence of antidepressant and antipsychotic drug prescribing, overall and by sociodemographic characteristics and drug subtype. RESULTS: The number of patients with a T2D diagnosis in Scotland increased from 161 915 in 2004 to 309 288 in 2021. Prevalence of antidepressant and antipsychotic prescribing in patients with T2D increased markedly between 2004 and 2021 (from 20.0 per 100 person-years to 33.3 per 100 person-years and from 2.8 per 100 person-years to 4.7 per 100 person-years, respectively). We observed this pattern for all drug subtypes except for first-generation antipsychotics, prescribing of which remained largely stable. The degree of increase, as well as the overall prevalence of prescribing, differed by age, sex, socioeconomic status and subtype of drug class. CONCLUSIONS: There has been a marked increase in the prevalence of antidepressant and antipsychotic prescribing in patients with T2D in Scotland. Further research should identify the reasons for this increase, including indication for use and the extent to which this reflects increases in incident prescribing rather than increased duration.

3.

Benchmarking network-based gene prioritization methods for cerebral small vessel disease.

Zhang, Huayu; Ferguson, Amy; Robertson, Grant; Jiang, Muchen; Zhang, Teng; Sudlow, Cathie; Smith, Keith; Rannikmae, Kristiina; Wu, Honghan.

Brief Bioinform ; 22(5)2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-33634312

RESUMO

Network-based gene prioritization algorithms are designed to prioritize disease-associated genes based on known ones using biological networks of protein interactions, gene-disease associations (GDAs) and other relationships between biological entities. Various algorithms have been developed based on different mechanisms, but it is not obvious which algorithm is optimal for a specific disease. To address this issue, we benchmarked multiple algorithms for their application in cerebral small vessel disease (cSVD). We curated protein-gene interactions (PGIs) and GDAs from databases and assembled PGI networks and disease-gene heterogeneous networks. A screening of algorithms resulted in seven representative algorithms to be benchmarked. Performance of algorithms was assessed using both leave-one-out cross-validation (LOOCV) and external validation with MEGASTROKE genome-wide association study (GWAS). We found that random walk with restart on the heterogeneous network (RWRH) showed best LOOCV performance, with median LOOCV rediscovery rank of 185.5 (out of 19 463 genes). The GenePanda algorithm had most GWAS-confirmable genes in top 200 predictions, while RWRH had best ranks for small vessel stroke-associated genes confirmed in GWAS. In conclusion, RWRH has overall better performance for application in cSVD despite its susceptibility to bias caused by degree centrality. Choice of algorithms should be determined before applying to specific disease. Current pure network-based gene prioritization algorithms are unlikely to find novel disease-associated genes that are not associated with known ones. The tools for implementing and benchmarking algorithms have been made available and can be generalized for other diseases.

Assuntos

Benchmarking/métodos , Doenças de Pequenos Vasos Cerebrais/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Mapas de Interação de Proteínas/genética , Algoritmos , Estudo de Associação Genômica Ampla , Humanos , Família Multigênica , Fenótipo , Fatores de Risco

4.

Ontology-driven and weakly supervised rare disease identification from clinical notes.

Dong, Hang; Suárez-Paniagua, Víctor; Zhang, Huayu; Wang, Minhong; Casey, Arlene; Davidson, Emma; Chen, Jiaoyan; Alex, Beatrice; Whiteley, William; Wu, Honghan.

BMC Med Inform Decis Mak ; 23(1): 86, 2023 05 05.

Artigo em Inglês | MEDLINE | ID: mdl-37147628

RESUMO

BACKGROUND: Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts. METHODS: We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-driven framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets, MIMIC-III discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging reports from two institutions in the US and the UK, with annotations. RESULTS: The improvements in the precision were pronounced (by over 30% to 50% absolute score for Text-to-UMLS linking), with almost no loss of recall compared to the existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. The overall pipeline processing clinical notes can extract rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes). CONCLUSION: The study provides empirical evidence for the task by applying a weakly supervised NLP pipeline on clinical notes. The proposed weak supervised deep learning approach requires no human annotation except for validation and testing, by leveraging ontologies, NER+L tools, and contextual representations. The study also demonstrates that Natural Language Processing (NLP) can complement traditional ICD-based approaches to better estimate rare diseases in clinical notes. We discuss the usefulness and limitations of the weak supervision approach and propose directions for future studies.

Assuntos

Processamento de Linguagem Natural , Doenças Raras , Humanos , Doenças Raras/diagnóstico , Aprendizado de Máquina , Unified Medical Language System , Classificação Internacional de Doenças

5.

Learning-based fully automated prediction of lumbar disc degeneration progression with specified clinical parameters and preliminary validation.

Cheung, Jason Pui Yin; Kuang, Xihe; Lai, Marcus Kin Long; Cheung, Kenneth Man-Chee; Karppinen, Jaro; Samartzis, Dino; Wu, Honghan; Zhao, Fengdong; Zheng, Zhaomin; Zhang, Teng.

Eur Spine J ; 31(8): 1960-1968, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-34657211

RESUMO

BACKGROUND: Lumbar disc degeneration (LDD) may be related to aging, biomechanical and genetic factors. Despite the extensive work on understanding its etiology, there is currently no automated tool for accurate prediction of its progression. PURPOSE: We aim to establish a novel deep learning-based pipeline to predict the progression of LDD-related findings using lumbar MRIs. MATERIALS AND METHODS: We utilized our dataset with MRIs acquired from 1,343 individual participants (taken at the baseline and the 5-year follow-up timepoint), and progression assessments (the Schneiderman score, disc bulging, and Pfirrmann grading) that were labelled by spine specialists with over ten years clinical experience. Our new pipeline was realized by integrating the MRI-SegFlow and the Visual Geometry Group-Medium (VGG-M) for automated disc region detection and LDD progression prediction correspondingly. The LDD progression was quantified by comparing the Schneiderman score, disc bulging and Pfirrmann grading at the baseline and at follow-up. A fivefold cross-validation was conducted to assess the predictive performance of the new pipeline. RESULTS: Our pipeline achieved very good performances on the LDD progression prediction, with high progression prediction accuracy of the Schneiderman score (Accuracy: 90.2 ± 0.9%), disc bulging (Accuracy: 90.4% ± 1.1%), and Pfirrmann grading (Accuracy: 89.9% ± 2.1%). CONCLUSION: This is the first attempt of using deep learning to predict LDD progression on a large dataset with 5-year follow-up. Requiring no human interference, our pipeline can potentially achieve similar predictive performances in new settings with minimal efforts.

Assuntos

Degeneração do Disco Intervertebral , Humanos , Degeneração do Disco Intervertebral/diagnóstico por imagem , Degeneração do Disco Intervertebral/genética , Vértebras Lombares/diagnóstico por imagem , Imageamento por Ressonância Magnética

6.

Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study.

Carr, Ewan; Bendayan, Rebecca; Bean, Daniel; Stammers, Matt; Wang, Wenjuan; Zhang, Huayu; Searle, Thomas; Kraljevic, Zeljko; Shek, Anthony; Phan, Hang T T; Muruet, Walter; Gupta, Rishi K; Shinton, Anthony J; Wyatt, Mike; Shi, Ting; Zhang, Xin; Pickles, Andrew; Stahl, Daniel; Zakeri, Rosita; Noursadeghi, Mahdad; O'Gallagher, Kevin; Rogers, Matt; Folarin, Amos; Karwath, Andreas; Wickstrøm, Kristin E; Köhn-Luque, Alvaro; Slater, Luke; Cardoso, Victor Roth; Bourdeaux, Christopher; Holten, Aleksander Rygh; Ball, Simon; McWilliams, Chris; Roguski, Lukasz; Borca, Florina; Batchelor, James; Amundsen, Erik Koldberg; Wu, Xiaodong; Gkoutos, Georgios V; Sun, Jiaxing; Pinto, Ashwin; Guthrie, Bruce; Breen, Cormac; Douiri, Abdel; Wu, Honghan; Curcin, Vasa; Teo, James T; Shah, Ajay M; Dobson, Richard J B.

BMC Med ; 19(1): 23, 2021 01 21.

Artigo em Inglês | MEDLINE | ID: mdl-33472631

RESUMO

BACKGROUND: The National Early Warning Score (NEWS2) is currently recommended in the UK for the risk stratification of COVID-19 patients, but little is known about its ability to detect severe cases. We aimed to evaluate NEWS2 for the prediction of severe COVID-19 outcome and identify and validate a set of blood and physiological parameters routinely collected at hospital admission to improve upon the use of NEWS2 alone for medium-term risk stratification. METHODS: Training cohorts comprised 1276 patients admitted to King's College Hospital National Health Service (NHS) Foundation Trust with COVID-19 disease from 1 March to 30 April 2020. External validation cohorts included 6237 patients from five UK NHS Trusts (Guy's and St Thomas' Hospitals, University Hospitals Southampton, University Hospitals Bristol and Weston NHS Foundation Trust, University College London Hospitals, University Hospitals Birmingham), one hospital in Norway (Oslo University Hospital), and two hospitals in Wuhan, China (Wuhan Sixth Hospital and Taikang Tongji Hospital). The outcome was severe COVID-19 disease (transfer to intensive care unit (ICU) or death) at 14 days after hospital admission. Age, physiological measures, blood biomarkers, sex, ethnicity, and comorbidities (hypertension, diabetes, cardiovascular, respiratory and kidney diseases) measured at hospital admission were considered in the models. RESULTS: A baseline model of 'NEWS2 + age' had poor-to-moderate discrimination for severe COVID-19 infection at 14 days (area under receiver operating characteristic curve (AUC) in training cohort = 0.700, 95% confidence interval (CI) 0.680, 0.722; Brier score = 0.192, 95% CI 0.186, 0.197). A supplemented model adding eight routinely collected blood and physiological parameters (supplemental oxygen flow rate, urea, age, oxygen saturation, C-reactive protein, estimated glomerular filtration rate, neutrophil count, neutrophil/lymphocyte ratio) improved discrimination (AUC = 0.735; 95% CI 0.715, 0.757), and these improvements were replicated across seven UK and non-UK sites. However, there was evidence of miscalibration with the model tending to underestimate risks in most sites. CONCLUSIONS: NEWS2 score had poor-to-moderate discrimination for medium-term COVID-19 outcome which raises questions about its use as a screening tool at hospital admission. Risk stratification was improved by including readily available blood and physiological parameters measured at hospital admission, but there was evidence of miscalibration in external sites. This highlights the need for a better understanding of the use of early warning scores for COVID.

Assuntos

COVID-19/diagnóstico , Escore de Alerta Precoce , Idoso , COVID-19/epidemiologia , COVID-19/virologia , Estudos de Coortes , Registros Eletrônicos de Saúde , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Pandemias , Prognóstico , SARS-CoV-2/isolamento & purificação , Medicina Estatal , Reino Unido/epidemiologia

7.

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation.

Dong, Hang; Suárez-Paniagua, Víctor; Whiteley, William; Wu, Honghan.

J Biomed Inform ; 116: 103728, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33711543

RESUMO

BACKGROUND: Diagnostic or procedural coding of clinical notes aims to derive a coded summary of disease-related information about patients. Such coding is usually done manually in hospitals but could potentially be automated to improve the efficiency and accuracy of medical coding. Recent studies on deep learning for automated medical coding achieved promising performances. However, the explainability of these models is usually poor, preventing them to be used confidently in supporting clinical practice. Another limitation is that these models mostly assume independence among labels, ignoring the complex correlations among medical codes which can potentially be exploited to improve the performance. METHODS: To address the issues of model explainability and label correlations, we propose a Hierarchical Label-wise Attention Network (HLAN), which aimed to interpret the model by quantifying importance (as attention weights) of words and sentences related to each of the labels. Secondly, we propose to enhance the major deep learning models with a label embedding (LE) initialisation approach, which learns a dense, continuous vector representation and then injects the representation into the final layers and the label-wise attention layers in the models. We evaluated the methods using three settings on the MIMIC-III discharge summaries: full codes, top-50 codes, and the UK NHS (National Health Service) COVID-19 (Coronavirus disease 2019) shielding codes. Experiments were conducted to compare the HLAN model and label embedding initialisation to the state-of-the-art neural network based methods, including variants of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). RESULTS: HLAN achieved the best Micro-level AUC and F1 on the top-50 code prediction, 91.9% and 64.1%, respectively; and comparable results on the NHS COVID-19 shielding code prediction to other models: around 97% Micro-level AUC. More importantly, in the analysis of model explanations, by highlighting the most salient words and sentences for each label, HLAN showed more meaningful and comprehensive model interpretation compared to the CNN-based models and its downgraded baselines, HAN and HA-GRU. Label embedding (LE) initialisation significantly boosted the previous state-of-the-art model, CNN with attention mechanisms, on the full code prediction to 52.5% Micro-level F1. The analysis of the layers initialised with label embeddings further explains the effect of this initialisation approach. The source code of the implementation and the results are openly available at https://github.com/acadTags/Explainable-Automated-Medical-Coding. CONCLUSION: We draw the conclusion from the evaluation results and analyses. First, with hierarchical label-wise attention mechanisms, HLAN can provide better or comparable results for automated coding to the state-of-the-art, CNN-based models. Second, HLAN can provide more comprehensive explanations for each label by highlighting key words and sentences in the discharge summaries, compared to the n-grams in the CNN-based models and the downgraded baselines, HAN and HA-GRU. Third, the performance of deep learning based multi-label classification for automated coding can be consistently boosted by initialising label embeddings that captures the correlations among labels. We further discuss the advantages and drawbacks of the overall method regarding its potential to be deployed to a hospital and suggest areas for future studies.

Assuntos

COVID-19 , Codificação Clínica/métodos , Redes Neurais de Computação , SARS-CoV-2 , COVID-19/epidemiologia , Codificação Clínica/estatística & dados numéricos , Aprendizado Profundo , Registros Eletrônicos de Saúde/estatística & dados numéricos , Humanos , Informática Médica , Pandemias/estatística & dados numéricos , Reino Unido/epidemiologia

8.

The reporting quality of natural language processing studies: systematic review of studies of radiology reports.

Davidson, Emma M; Poon, Michael T C; Casey, Arlene; Grivas, Andreas; Duma, Daniel; Dong, Hang; Suárez-Paniagua, Víctor; Grover, Claire; Tobin, Richard; Whalley, Heather; Wu, Honghan; Alex, Beatrice; Whiteley, William.

BMC Med Imaging ; 21(1): 142, 2021 10 02.

Artigo em Inglês | MEDLINE | ID: mdl-34600486

RESUMO

BACKGROUND: Automated language analysis of radiology reports using natural language processing (NLP) can provide valuable information on patients' health and disease. With its rapid development, NLP studies should have transparent methodology to allow comparison of approaches and reproducibility. This systematic review aims to summarise the characteristics and reporting quality of studies applying NLP to radiology reports. METHODS: We searched Google Scholar for studies published in English that applied NLP to radiology reports of any imaging modality between January 2015 and October 2019. At least two reviewers independently performed screening and completed data extraction. We specified 15 criteria relating to data source, datasets, ground truth, outcomes, and reproducibility for quality assessment. The primary NLP performance measures were precision, recall and F1 score. RESULTS: Of the 4,836 records retrieved, we included 164 studies that used NLP on radiology reports. The commonest clinical applications of NLP were disease information or classification (28%) and diagnostic surveillance (27.4%). Most studies used English radiology reports (86%). Reports from mixed imaging modalities were used in 28% of the studies. Oncology (24%) was the most frequent disease area. Most studies had dataset size > 200 (85.4%) but the proportion of studies that described their annotated, training, validation, and test set were 67.1%, 63.4%, 45.7%, and 67.7% respectively. About half of the studies reported precision (48.8%) and recall (53.7%). Few studies reported external validation performed (10.8%), data availability (8.5%) and code availability (9.1%). There was no pattern of performance associated with the overall reporting quality. CONCLUSIONS: There is a range of potential clinical applications for NLP of radiology reports in health services and research. However, we found suboptimal reporting quality that precludes comparison, reproducibility, and replication. Our results support the need for development of reporting standards specific to clinical NLP studies.

Assuntos

Processamento de Linguagem Natural , Radiografia , Radiologia/normas , Conjuntos de Dados como Assunto , Humanos , Reprodutibilidade dos Testes , Relatório de Pesquisa/normas

9.

Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke.

Rannikmäe, Kristiina; Wu, Honghan; Tominey, Steven; Whiteley, William; Allen, Naomi; Sudlow, Cathie.

BMC Med Inform Decis Mak ; 21(1): 191, 2021 06 15.

Artigo em Inglês | MEDLINE | ID: mdl-34130677

RESUMO

BACKGROUND: Better phenotyping of routinely collected coded data would be useful for research and health improvement. For example, the precision of coded data for hemorrhagic stroke (intracerebral hemorrhage [ICH] and subarachnoid hemorrhage [SAH]) may be as poor as < 50%. This work aimed to investigate the feasibility and added value of automated methods applied to clinical radiology reports to improve stroke subtyping. METHODS: From a sub-population of 17,249 Scottish UK Biobank participants, we ascertained those with an incident stroke code in hospital, death record or primary care administrative data by September 2015, and ≥ 1 clinical brain scan report. We used a combination of natural language processing and clinical knowledge inference on brain scan reports to assign a stroke subtype (ischemic vs ICH vs SAH) for each participant and assessed performance by precision and recall at entity and patient levels. RESULTS: Of 225 participants with an incident stroke code, 207 had a relevant brain scan report and were included in this study. Entity level precision and recall ranged from 78 to 100%. Automated methods showed precision and recall at patient level that were very good for ICH (both 89%), good for SAH (both 82%), but, as expected, lower for ischemic stroke (73%, and 64%, respectively), suggesting coded data remains the preferred method for identifying the latter stroke subtype. CONCLUSIONS: Our automated method applied to radiology reports provides a feasible, scalable and accurate solution to improve disease subtyping when used in conjunction with administrative coded health data. Future research should validate these findings in a different population setting.

Assuntos

Acidente Vascular Cerebral , Hemorragia Subaracnóidea , Bancos de Espécimes Biológicos , Hemorragia Cerebral , Humanos , Acidente Vascular Cerebral/diagnóstico por imagem , Hemorragia Subaracnóidea/diagnóstico por imagem , Hemorragia Subaracnóidea/epidemiologia , Reino Unido

10.

A systematic review of natural language processing applied to radiology reports.

Casey, Arlene; Davidson, Emma; Poon, Michael; Dong, Hang; Duma, Daniel; Grivas, Andreas; Grover, Claire; Suárez-Paniagua, Víctor; Tobin, Richard; Whiteley, William; Wu, Honghan; Alex, Beatrice.

BMC Med Inform Decis Mak ; 21(1): 179, 2021 06 03.

Artigo em Inglês | MEDLINE | ID: mdl-34082729

RESUMO

BACKGROUND: Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports. METHODS: We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics. RESULTS: We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results. CONCLUSIONS: Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.

Assuntos

Sistemas de Informação em Radiologia , Radiologia , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Reprodutibilidade dos Testes

11.

Deciphering the biodegradation of petroleum hydrocarbons using FTIR spectroscopy: application to a contaminated site.

Yang, Mingxing; Cao, Zhendong; Zhang, Yue; Wu, Honghan.

Water Sci Technol ; 80(7): 1315-1325, 2019 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-31850883

RESUMO

The chemical composition of groundwater in a petroleum-contaminated site is determined by the present functional groups and these play a vital role in a feasibility remediation technique. Based on the in situ investigation of a contaminated shallow groundwater in an oilfield, Fourier transform infrared (FTIR) spectroscopy associated with chemometric treatments, principal component analysis (PCA), and simple-to-use interactive self-modeling mixture analysis (SIMPLISMA), were used to decipher the biodegradation process by analyzing the conversion of functional groups. Environmental factors that can influence microbial metabolism were also evaluated for a comprehensive explanation. FTIR spectroscopy and PCA results showed that the contamination in the study area can be divided into three parts based on FTIR spectra: (1) regular contamination plume distribution and biodegradation level to fresh oil, (2) moderate biodegradation area, and (3) intensive biodegradation area. FTIR spectra further revealed the present functional groups as aliphatic, aromatic, and polar family compounds. SIMPLISMA was used to discuss the degree of biodegradation along the flow path quantitatively and qualitatively and elucidated that the aliphatic and aromatic compounds were mainly metabolized into polar compounds with nitrogen, sulfur, and oxygen via microbes. During metabolism, microbial indices, such as the Shannon-Weaver, Simpson, and Pielou indices, indicated that microbial diversity did not greatly change; hence, hydrocarbons were constantly consumed to feed dominant microbes. Dissolved oxygen concentrations decreased from 4.58 ± 0.31 mg/L (in monitoring well Z1) to 3.21 ± 0.26 mg/L (in monitoring well Z16) and then became constant in the down-gradient area, demonstrating that aerobic biodegradation was the dominant process at the up-gradient plume. Results were in accordance with the oxidation index, which continuously increased from 0.028 ± 0.013 (in monitoring well Z1) to 0.669 ± 0.047 (in monitoring well Z10), showing that oxygen was consumed along the flow path. Similarly, concentration changes in Fe2+, Mn2+, and SO4 2- proved that the down-gradient area was in reduction condition.

Assuntos

Petróleo , Poluentes Químicos da Água , Biodegradação Ambiental , Hidrocarbonetos , Espectroscopia de Infravermelho com Transformada de Fourier

12.

CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital.

Jackson, Richard; Kartoglu, Ismail; Stringer, Clive; Gorrell, Genevieve; Roberts, Angus; Song, Xingyi; Wu, Honghan; Agrawal, Asha; Lui, Kenneth; Groza, Tudor; Lewsley, Damian; Northwood, Doug; Folarin, Amos; Stewart, Robert; Dobson, Richard.

BMC Med Inform Decis Mak ; 18(1): 47, 2018 06 25.

Artigo em Inglês | MEDLINE | ID: mdl-29941004

RESUMO

BACKGROUND: Traditional health information systems are generally devised to support clinical data collection at the point of care. However, as the significance of the modern information economy expands in scope and permeates the healthcare domain, there is an increasing urgency for healthcare organisations to offer information systems that address the expectations of clinicians, researchers and the business intelligence community alike. Amongst other emergent requirements, the principal unmet need might be defined as the 3R principle (right data, right place, right time) to address deficiencies in organisational data flow while retaining the strict information governance policies that apply within the UK National Health Service (NHS). Here, we describe our work on creating and deploying a low cost structured and unstructured information retrieval and extraction architecture within King's College Hospital, the management of governance concerns and the associated use cases and cost saving opportunities that such components present. RESULTS: To date, our CogStack architecture has processed over 300 million lines of clinical data, making it available for internal service improvement projects at King's College London. On generated data designed to simulate real world clinical text, our de-identification algorithm achieved up to 94% precision and up to 96% recall. CONCLUSION: We describe a toolkit which we feel is of huge value to the UK (and beyond) healthcare community. It is the only open source, easily deployable solution designed for the UK healthcare environment, in a landscape populated by expensive proprietary systems. Solutions such as these provide a crucial foundation for the genomic revolution in medicine.

Assuntos

Registros Eletrônicos de Saúde , Hospitais , Armazenamento e Recuperação da Informação/métodos , Programas Nacionais de Saúde , Processamento de Linguagem Natural , Humanos , Reino Unido

13.

Machine learning on cardiotocography data to classify fetal outcomes: A scoping review.

Francis, Farah; Luz, Saturnino; Wu, Honghan; Stock, Sarah J; Townsend, Rosemary.

Comput Biol Med ; 172: 108220, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38489990

RESUMO

INTRODUCTION: Uterine contractions during labour constrict maternal blood flow and oxygen delivery to the developing baby, causing transient hypoxia. While most babies are physiologically adapted to withstand such intrapartum hypoxia, those exposed to severe hypoxia or with poor physiological reserves may experience neurological injury or death during labour. Cardiotocography (CTG) monitoring was developed to identify babies at risk of hypoxia by detecting changes in fetal heart rate (FHR) patterns. CTG monitoring is in widespread use in intrapartum care for the detection of fetal hypoxia, but the clinical utility is limited by a relatively poor positive predictive value (PPV) of an abnormal CTG and significant inter and intra observer variability in CTG interpretation. Clinical risk and human factors may impact the quality of CTG interpretation. Misclassification of CTG traces may lead to both under-treatment (with the risk of fetal injury or death) or over-treatment (which may include unnecessary operative interventions that put both mother and baby at risk of complications). Machine learning (ML) has been applied to this problem since early 2000 and has shown potential to predict fetal hypoxia more accurately than visual interpretation of CTG alone. To consider how these tools might be translated for clinical practice, we conducted a review of ML techniques already applied to CTG classification and identified research gaps requiring investigation in order to progress towards clinical implementation. MATERIALS AND METHOD: We used identified keywords to search databases for relevant publications on PubMed, EMBASE and IEEE Xplore. We used Preferred Reporting Items for Systematic Review and Meta-Analysis for Scoping Reviews (PRISMA-ScR). Title, abstract and full text were screened according to the inclusion criteria. RESULTS: We included 36 studies that used signal processing and ML techniques to classify CTG. Most studies used an open-access CTG database and predominantly used fetal metabolic acidosis as the benchmark for hypoxia with varying pH levels. Various methods were used to process and extract CTG signals and several ML algorithms were used to classify CTG. We identified significant concerns over the practicality of using varying pH levels as the CTG classification benchmark. Furthermore, studies needed to be more generalised as most used the same database with a low number of subjects for an ML study. CONCLUSION: ML studies demonstrate potential in predicting fetal hypoxia from CTG. However, more diverse datasets, standardisation of hypoxia benchmarks and enhancement of algorithms and features are needed for future clinical implementation.

Assuntos

Cardiotocografia , Trabalho de Parto , Feminino , Humanos , Gravidez , Cardiotocografia/métodos , Hipóxia Fetal/diagnóstico , Frequência Cardíaca Fetal/fisiologia , Contração Uterina

14.

Applying contrastive pre-training for depression and anxiety risk prediction in type 2 diabetes patients based on heterogeneous electronic health records: a primary healthcare case study.

Feng, Wei; Wu, Honghan; Ma, Hui; Tao, Zhenhuan; Xu, Mengdie; Zhang, Xin; Lu, Shan; Wan, Cheng; Liu, Yun.

J Am Med Inform Assoc ; 31(2): 445-455, 2024 Jan 18.

Artigo em Inglês | MEDLINE | ID: mdl-38062850

RESUMO

OBJECTIVE: Due to heterogeneity and limited medical data in primary healthcare services (PHS), assessing the psychological risk of type 2 diabetes mellitus (T2DM) patients in PHS is difficult. Using unsupervised contrastive pre-training, we proposed a deep learning framework named depression and anxiety prediction (DAP) to predict depression and anxiety in T2DM patients. MATERIALS AND METHODS: The DAP model consists of two sub-models. Firstly, the pre-trained model of DAP used unlabeled discharge records of 85 085 T2DM patients from the First Affiliated Hospital of Nanjing Medical University for unsupervised contrastive learning on heterogeneous electronic health records (EHRs). Secondly, the fine-tuned model of DAP used case-control cohorts (17 491 patients) selected from 149 596 T2DM patients' EHRs in the Nanjing Health Information Platform (NHIP). The DAP model was validated in 1028 patients from PHS in NHIP. Evaluation included receiver operating characteristic area under the curve (ROC-AUC) and precision-recall area under the curve (PR-AUC), and decision curve analysis (DCA). RESULTS: The pre-training step allowed the DAP model to converge at a faster rate. The fine-tuned DAP model significantly outperformed the baseline models (logistic regression, extreme gradient boosting, and random forest) with ROC-AUC of 0.91±0.028 and PR-AUC of 0.80±0.067 in 10-fold internal validation, and with ROC-AUC of 0.75 ± 0.045 and PR-AUC of 0.47 ± 0.081 in external validation. The DCA indicate the clinical potential of the DAP model. CONCLUSION: The DAP model effectively predicted post-discharge depression and anxiety in T2DM patients from PHS, reducing data fragmentation and limitations. This study highlights the DAP model's potential for early detection and intervention in depression and anxiety, improving outcomes for diabetes patients.

Assuntos

Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/complicações , Diabetes Mellitus Tipo 2/diagnóstico , Registros Eletrônicos de Saúde , Assistência ao Convalescente , Depressão , Aprendizado de Máquina , Alta do Paciente , Ansiedade

15.

A unidirectional water-transport antibacterial bilayer nanofibrous dressing based on chitosan for accelerating wound healing.

Wu, Hengpeng; Gao, Botao; Wu, Honghan; Song, Jiaxiang; Zhu, Li; Zhou, Meng; Linghu, Xitao; Huang, Shuai; Zhou, Zongbao; Wa, Qingde.

Int J Biol Macromol ; 269(Pt 2): 131878, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38692530

RESUMO

Excessive accumulation of exudate from wounds often causes infection and hinders skin regeneration. To handle wound exudate quickly and prevent infection, we developed an antibacterial Janus nanofibrous dressing with a unidirectional water-transport function. The dressing consists of a hydrophilic chitosan aerogel (CS-A) as the outer layer and a hydrophobic laurylated chitosan (La-CS) nanofibrous membrane as the inner layer. These dressings achieved excellent liquid absorption performance (2987.8â¯±â¯123.5â¯%), air and moisture permeability (997.8â¯±â¯23.1â¯g/m2/day) and mechanical strength (5.1â¯±â¯2.6â¯MPa). This performance was obtained by adjusting the density of CS-A and the thickness of the La-CS membrane. Moreover, the dressing did not induce significant toxicity to cells and can prevent bacterial aggregation and infection at the wound site. Animal experiments showed that the dressing can shorten the inflammatory phase, enhance blood vessel generation, and accelerate collagen deposition, thus promoting wound healing. Overall, these results suggest that this Janus dressing is a promising material for clinical wound care.

Assuntos

Antibacterianos , Bandagens , Quitosana , Nanofibras , Água , Cicatrização , Quitosana/química , Quitosana/farmacologia , Cicatrização/efeitos dos fármacos , Nanofibras/química , Antibacterianos/farmacologia , Antibacterianos/química , Animais , Água/química , Camundongos , Interações Hidrofóbicas e Hidrofílicas , Permeabilidade , Ratos , Staphylococcus aureus/efeitos dos fármacos , Masculino

16.

The impact of inconsistent human annotations on AI driven clinical decision making.

Sylolypavan, Aneeta; Sleeman, Derek; Wu, Honghan; Sim, Malcolm.

NPJ Digit Med ; 6(1): 26, 2023 Feb 21.

Artigo em Inglês | MEDLINE | ID: mdl-36810915

RESUMO

In supervised learning model development, domain experts are often used to provide the class labels (annotations). Annotation inconsistencies commonly occur when even highly experienced clinical experts annotate the same phenomenon (e.g., medical image, diagnostics, or prognostic status), due to inherent expert bias, judgments, and slips, among other factors. While their existence is relatively well-known, the implications of such inconsistencies are largely understudied in real-world settings, when supervised learning is applied on such 'noisy' labelled data. To shed light on these issues, we conducted extensive experiments and analyses on three real-world Intensive Care Unit (ICU) datasets. Specifically, individual models were built from a common dataset, annotated independently by 11 Glasgow Queen Elizabeth University Hospital ICU consultants, and model performance estimates were compared through internal validation (Fleiss' κ = 0.383 i.e., fair agreement). Further, broad external validation (on both static and time series datasets) of these 11 classifiers was carried out on a HiRID external dataset, where the models' classifications were found to have low pairwise agreements (average Cohen's κ = 0.255 i.e., minimal agreement). Moreover, they tend to disagree more on making discharge decisions (Fleiss' κ = 0.174) than predicting mortality (Fleiss' κ = 0.267). Given these inconsistencies, further analyses were conducted to evaluate the current best practices in obtaining gold-standard models and determining consensus. The results suggest that: (a) there may not always be a "super expert" in acute clinical settings (using internal and external validation model performances as a proxy); and (b) standard consensus seeking (such as majority vote) consistently leads to suboptimal models. Further analysis, however, suggests that assessing annotation learnability and using only 'learnable' annotated datasets for determining consensus achieves optimal models in most cases.

17.

Machine Learning to Classify Cardiotocography for Fetal Hypoxia Detection.

Francis, Farah; Luz, Saturnino; Wu, Honghan; Townsend, Rosemary; Stock, Sarah S.

Annu Int Conf IEEE Eng Med Biol Soc ; 2023: 1-4, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-38083272

RESUMO

Fetal hypoxia can cause damaging consequences on babies' such as stillbirth and cerebral palsy. Cardiotocography (CTG) has been used to detect intrapartum fetal hypoxia during labor. It is a non-invasive machine that measures the fetal heart rate and uterine contractions. Visual CTG suffers inconsistencies in interpretations among clinicians that can delay interventions. Machine learning (ML) showed potential in classifying abnormal CTG, allowing automatic interpretation. In the absence of a gold standard, researchers used various surrogate biomarkers to classify CTG, where some were clinically irrelevant. We proposed using Apgar scores as the surrogate benchmark of babies' ability to recover from birth. Apgar scores measure newborns' ability to recover from active uterine contraction, which measures appearance, pulse, grimace, activity and respiration. The higher the Apgar score, the healthier the baby is.We employ signal processing methods to pre-process and extract validated features of 552 raw CTG. We also included CTG-specific characteristics as outlined in the NICE guidelines. We employed ML techniques using 22 features and measured performances between ML classifiers. While we found that ML can distinguish CTG with low Apgar scores, results for the lowest Apgar scores, which are rare in the dataset we used, would benefit from more CTG data for better performance. We need an external dataset to validate our model for generalizability to ensure that it does not overfit a specific population.Clinical Relevance- This study demonstrated the potential of using a clinically relevant benchmark for classifying CTG to allow automatic early detection of hypoxia to reduce decision-making time in maternity units.

Assuntos

Doenças do Recém-Nascido , Trabalho de Parto , Lactente , Gravidez , Recém-Nascido , Feminino , Humanos , Cardiotocografia/métodos , Hipóxia Fetal/diagnóstico , Contração Uterina , Hipóxia/diagnóstico

18.

Antidepressant and antipsychotic drug prescribing and diabetes outcomes: A systematic review of observational studies.

Greene, Charlotte R L; Ward-Penny, Hanna; Ioannou, Marianna F; Wild, Sarah H; Wu, Honghan; Smith, Daniel J; Jackson, Caroline A.

Diabetes Res Clin Pract ; 199: 110649, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-37004975

RESUMO

AIMS: Psychotropic medication may be associated with adverse effects, including among people with diabetes. We conducted a systematic review of observational studies investigating the association between antidepressant or antipsychotic drug prescribing and type 2 diabetes outcomes. METHODS: We systematically searched PubMed, EMBASE, and PsycINFO to 15th August 2022 to identify eligible studies. We used the Newcastle-Ottawa scale to assess study quality and performed a narrative synthesis. RESULTS: We included 18 studies, 14 reporting on antidepressants and four on antipsychotics. There were 11 cohort studies, one self-controlled before and after study, two case-control studies, and four cross-sectional studies, of variable quality with highly heterogeneous study populations, exposure definitions, and outcomes analysed. Antidepressant prescribing may be associated with increased risk of macrovascular disease, whilst evidence on antidepressant and antipsychotic prescribing and glycaemic control was mixed. Few studies reported microvascular outcomes and risk factors other than glycaemic control. CONCLUSIONS: Studies of antidepressant and antipsychotic drug prescribing in relation to diabetes outcomes are scarce, with shortcomings and mixed findings. Until further evidence is available, people with diabetes prescribed antidepressants and antipsychotics should receive monitoring and appropriate treatment of risk factors and screening for complications as recommended in general diabetes guidelines.

Assuntos

Antipsicóticos , Diabetes Mellitus Tipo 2 , Humanos , Antipsicóticos/efeitos adversos , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/induzido quimicamente , Estudos Transversais , Antidepressivos/efeitos adversos , Estudos de Casos e Controles

19.

Prediction of disease comorbidity using explainable artificial intelligence and machine learning techniques: A systematic review.

Alsaleh, Mohanad M; Allery, Freya; Choi, Jung Won; Hama, Tuankasfee; McQuillin, Andrew; Wu, Honghan; Thygesen, Johan H.

Int J Med Inform ; 175: 105088, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37156169

RESUMO

OBJECTIVE: Disease comorbidity is a major challenge in healthcare affecting the patient's quality of life and costs. AI-based prediction of comorbidities can overcome this issue by improving precision medicine and providing holistic care. The objective of this systematic literature review was to identify and summarise existing machine learning (ML) methods for comorbidity prediction and evaluate the interpretability and explainability of the models. MATERIALS AND METHODS: The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework was used to identify articles in three databases: Ovid Medline, Web of Science and PubMed. The literature search covered a broad range of terms for the prediction of disease comorbidity and ML, including traditional predictive modelling. RESULTS: Of 829 unique articles, 58 full-text papers were assessed for eligibility. A final set of 22 articles with 61 ML models was included in this review. Of the identified ML models, 33 models achieved relatively high accuracy (80-95%) and AUC (0.80-0.89). Overall, 72% of studies had high or unclear concerns regarding the risk of bias. DISCUSSION: This systematic review is the first to examine the use of ML and explainable artificial intelligence (XAI) methods for comorbidity prediction. The chosen studies focused on a limited scope of comorbidities ranging from 1 to 34 (mean = 6), and no novel comorbidities were found due to limited phenotypic and genetic data. The lack of standard evaluation for XAI hinders fair comparisons. CONCLUSION: A broad range of ML methods has been used to predict the comorbidities of various disorders. With further development of explainable ML capacity in the field of comorbidity prediction, there is a significant possibility of identifying unmet health needs by highlighting comorbidities in patient groups that were not previously recognised to be at risk for particular comorbidities.

Assuntos

Inteligência Artificial , Qualidade de Vida , Humanos , Aprendizado de Máquina , Comorbidade , Definição da Elegibilidade

20.

FLAP: a framework for linking free-text addresses to the Ordnance Survey Unique Property Reference Number database.

Zhang, Huayu; Casey, Arlene; Guellil, Imane; Suárez-Paniagua, Víctor; MacRae, Clare; Marwick, Charis; Wu, Honghan; Guthrie, Bruce; Alex, Beatrice.

Front Digit Health ; 5: 1186208, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38090654

RESUMO

Introduction: Linking free-text addresses to unique identifiers in a structural address database [the Ordnance Survey unique property reference number (UPRN) in the United Kingdom (UK)] is a necessary step for downstream geospatial analysis in many digital health systems, e.g., for identification of care home residents, understanding housing transitions in later life, and informing decision making on geographical health and social care resource distribution. However, there is a lack of open-source tools for this task with performance validated in a test data set. Methods: In this article, we propose a generalisable solution (A Framework for Linking free-text Addresses to Ordnance Survey UPRN database, FLAP) based on a machine learning-based matching classifier coupled with a fuzzy aligning algorithm for feature generation with better performance than existing tools. The framework is implemented in Python as an Open Source tool (available at Link). We tested the framework in a real-world scenario of linking individual's (n=771,588) addresses recorded as free text in the Community Health Index (CHI) of National Health Service (NHS) Tayside and NHS Fife to the Unique Property Reference Number database (UPRN DB). Results: We achieved an adjusted matching accuracy of 0.992 in a test data set randomly sampled (n=3,876) from NHS Tayside and NHS Fife CHI addresses. FLAP showed robustness against input variations including typographical errors, alternative formats, and partially incorrect information. It has also improved usability compared to existing solutions allowing the use of a customised threshold of matching confidence and selection of top n candidate records. The use of machine learning also provides better adaptability of the tool to new data and enables continuous improvement. Discussion: In conclusion, we have developed a framework, FLAP, for linking free-text UK addresses to the UPRN DB with good performance and usability in a real-world task.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA