RESUMO
OBJECTIVE: To develop soft prompt-based learning architecture for large language models (LLMs), examine prompt-tuning using frozen/unfrozen LLMs, and assess their abilities in transfer learning and few-shot learning. METHODS: We developed a soft prompt-based learning architecture and compared 4 strategies including (1) fine-tuning without prompts; (2) hard-prompting with unfrozen LLMs; (3) soft-prompting with unfrozen LLMs; and (4) soft-prompting with frozen LLMs. We evaluated GatorTron, a clinical LLM with up to 8.9 billion parameters, and compared GatorTron with 4 existing transformer models for clinical concept and relation extraction on 2 benchmark datasets for adverse drug events and social determinants of health (SDoH). We evaluated the few-shot learning ability and generalizability for cross-institution applications. RESULTS AND CONCLUSION: When LLMs are unfrozen, GatorTron-3.9B with soft prompting achieves the best strict F1-scores of 0.9118 and 0.8604 for concept extraction, outperforming the traditional fine-tuning and hard prompt-based models by 0.6 â¼ 3.1 % and 1.2 â¼ 2.9 %, respectively; GatorTron-345 M with soft prompting achieves the best F1-scores of 0.8332 and 0.7488 for end-to-end relation extraction, outperforming other two models by 0.2 â¼ 2 % and 0.6 â¼ 11.7 %, respectively. When LLMs are frozen, small LLMs have a big gap to be competitive with unfrozen models; scaling LLMs up to billions of parameters makes frozen LLMs competitive with unfrozen models. Soft prompting with a frozen GatorTron-8.9B model achieved the best performance for cross-institution evaluation. We demonstrate that (1) machines can learn soft prompts better than hard prompts composed by human, (2) frozen LLMs have good few-shot learning ability and generalizability for cross-institution applications, (3) frozen LLMs reduce computing cost to 2.5 â¼ 6 % of previous methods using unfrozen LLMs, and (4) frozen LLMs require large models (e.g., over several billions of parameters) for good performance.
Assuntos
Processamento de Linguagem Natural , Humanos , Aprendizado de Máquina , Mineração de Dados/métodos , Algoritmos , Determinantes Sociais da Saúde , Efeitos Colaterais e Reações Adversas Relacionados a MedicamentosRESUMO
OBJECTIVE: To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio. METHODS: We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups. RESULTS: We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (â¼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups. CONCLUSIONS: Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.
Assuntos
Narração , Processamento de Linguagem Natural , Determinantes Sociais da Saúde , Humanos , Feminino , Masculino , Viés , Registros Eletrônicos de Saúde , Documentação/métodos , Mineração de Dados/métodosRESUMO
Artificial intelligence (AI) has become a crucial element of modern technology, especially in the healthcare sector, which is apparent given the continuous development of large language models (LLMs), which are utilized in various domains, including medical beings. However, when it comes to using these LLMs for the medical domain, there's a need for an evaluation platform to determine their suitability and drive future development efforts. Towards that end, this study aims to address this concern by developing a comprehensive Multi-Criteria Decision Making (MCDM) approach that is specifically designed to evaluate medical LLMs. The success of AI, particularly LLMs, in the healthcare domain, depends on their efficacy, safety, and ethical compliance. Therefore, it is essential to have a robust evaluation framework for their integration into medical contexts. This study proposes using the Fuzzy-Weighted Zero-InConsistency (FWZIC) method extended to p, q-quasirung orthopair fuzzy set (p, q-QROFS) for weighing evaluation criteria. This extension enables the handling of uncertainties inherent in medical decision-making processes. The approach accommodates the imprecise and multifaceted nature of real-world medical data and criteria by incorporating fuzzy logic principles. The MultiAtributive Ideal-Real Comparative Analysis (MAIRCA) method is employed for the assessment of medical LLMs utilized in the case study of this research. The results of this research revealed that "Medical Relation Extraction" criteria with its sub-levels had more importance with (0.504) than "Clinical Concept Extraction" with (0.495). For the LLMs evaluated, out of 6 alternatives, ( A 4 ) "GatorTron S 10B" had the 1st rank as compared to ( A 1 ) "GatorTron 90B" had the 6th rank. The implications of this study extend beyond academic discourse, directly impacting healthcare practices and patient outcomes. The proposed framework can help healthcare professionals make more informed decisions regarding the adoption and utilization of LLMs in medical settings.
Assuntos
Inteligência Artificial , Lógica Fuzzy , Humanos , Tomada de DecisõesRESUMO
Rising incidence and mortality of cancer have led to an incremental amount of research in the field. To learn from preexisting data, it has become important to capture maximum information related to disease type, stage, treatment, and outcomes. Medical imaging reports are rich in this kind of information but are only present as free text. The extraction of information from such unstructured text reports is labor-intensive. The use of Natural Language Processing (NLP) tools to extract information from radiology reports can make it less time-consuming as well as more effective. In this study, we have developed and compared different models for the classification of lung carcinoma reports using clinical concepts. This study was approved by the institutional ethics committee as a retrospective study with a waiver of informed consent. A clinical concept-based classification pipeline for lung carcinoma radiology reports was developed using rule-based as well as machine learning models and compared. The machine learning models used were XGBoost and two more deep learning model architectures with bidirectional long short-term neural networks. A corpus consisting of 1700 radiology reports including computed tomography (CT) and positron emission tomography/computed tomography (PET/CT) reports were used for development and testing. Five hundred one radiology reports from MIMIC-III Clinical Database version 1.4 was used for external validation. The pipeline achieved an overall F1 score of 0.94 on the internal set and 0.74 on external validation with the rule-based algorithm using expert input giving the best performance. Among the machine learning models, the Bi-LSTM_dropout model performed better than the ML model using XGBoost and the Bi-LSTM_simple model on internal set, whereas on external validation, the Bi-LSTM_simple model performed relatively better than other 2. This pipeline can be used for clinical concept-based classification of radiology reports related to lung carcinoma from a huge corpus and also for automated annotation of these reports.
Assuntos
Carcinoma , Radiologia , Humanos , Estudos Retrospectivos , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Processamento de Linguagem Natural , PulmãoRESUMO
Named Entity Recognition (NER) or the extraction of concepts from clinical text is the task of identifying entities in text and slotting them into categories such as problems, treatments, tests, clinical departments, occurrences (such as admission and discharge) and others. NER forms a critical component of processing and leveraging unstructured data from Electronic Health Records (EHR). While identifying the spans and categories of concepts is itself a challenging task, these entities could also have attributes such as negation that pivot their meanings implied to the consumers of the named entities. There has been little research dedicated to identifying the entities and their qualifying attributes together. This research hopes to contribute to the area of detecting entities and their corresponding attributes by modelling the NER task as a supervised, multi-label tagging problem with each of the attributes assigned tagging sequence labels. In this paper, we propose 3 architectures to achieve this multi-label entity tagging: BiLSTM n-CRF, BiLSTM-CRF-Smax-TF and BiLSTM n-CRF-TF. We evaluate these methods on the 2010 i2b2/VA and the i2b2 2012 shared task datasets. Our different models obtain best NER scores of 0.903 and 0.808 on the i2b2 2010/VA and i2b2 2012 respectively. The highest span based micro-averaged F1 polarity scores obtained were 0.832 and 0.836 on the i2b2 2010/VA and i2b2 2012 datasets respectively, and the highest macro-averaged F1 polarity scores obtained were 0.924 and 0.888 respectively. The modality studies conducted on i2b2 2012 dataset revealed high scores of 0.818 and 0.501 for span based micro-averaged F1 and macro-averaged F1 respectively.
Assuntos
Registros Eletrônicos de Saúde , Alta do Paciente , Humanos , Processamento de Linguagem NaturalRESUMO
The paper presents a method for recommending augmentations against conceptual gaps in textbooks. Question Answer (QA) pairs from community question-answering (cQA) forums are noted to offer precise and comprehensive illustrations of concepts. Our proposed method retrieves QA pairs for a target concept to suggest two types of augmentations: basic and supplementary. Basic augmentations are suggested for the concepts on which a textbook lacks fundamental references. We identified such deficiencies by employing a supervised machine learning-based approach trained on 12 features concerning the textbook's discourse. Supplementary augmentations aiming for additional references are suggested for all the concepts. Retrieved QA pairs were filtered to ensure their comprehensiveness for the target students. The proposed augmentation system was deployed using a web-based interface. We collected 28 Indian textbooks and manually curated them to create gold standards for assessing our proposed system. Analyzing expert opinions and adopting an equivalent pretest-posttest setup for the students, the quality of these augmentations was quantified. We evaluated the usability of the interface from students' responses. Both system and human-based evaluations indicated that the suggested augmentations addressed the concept-specific deficiency and provided additional materials to stimulate learning interest. The learning interface was easy-to-use and showcased these augmentations effectively.
RESUMO
The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-relevant content in social media without manual labeling is a challenge because of the sheer volume of irrelevant data. We present an unsupervised, iterative approach to mine clinically relevant information from social media data, which begins by heuristically filtering for HCP-authored texts and incorporates topic modeling and concept extraction with MetaMap. This approach identifies granular topics and tweets with high clinical relevance from a set of about 52 million COVID-19-related tweets from January to mid-June 2020. We also show that because the technique does not require manual labeling, it can be used to identify emerging topics on a week-to-week basis. Our method can aid in future public-health emergencies by facilitating knowledge transfer among healthcare workers in a rapidly-changing information environment, and by providing an efficient and unsupervised way of highlighting potential areas for clinical research.
Assuntos
COVID-19 , Mídias Sociais , Humanos , Armazenamento e Recuperação da Informação , Pandemias , SARS-CoV-2RESUMO
BACKGROUND: Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES: In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS: Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS: A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.
Assuntos
Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Bibliometria , Projetos de PesquisaRESUMO
Our goal is to summarise and aggregate information from social media regarding the symptoms of a disease, the drugs used and the treatment effects both positive and negative. To achieve this we first apply a supervised machine learning method to automatically extract medical concepts from natural language text. In an environment such as social media, where new data is continuously streamed, we need a methodology that will allow us to continuously train with the new data. To attain such incremental re-training, a semi-supervised methodology is developed, which is capable of learning new concepts from a small set of labelled data together with the much larger set of unlabelled data. The semi-supervised methodology deploys a conditional random field (CRF) as the base-line training algorithm for extracting medical concepts. The methodology iteratively augments to the training set sentences having high confidence, and adds terms to existing dictionaries to be used as features with the base-line model for further classification. Our empirical results show that the base-line CRF performs strongly across a range of different dictionary and training sizes; when the base-line is built with the full training data the F1 score reaches the range 84%-90%. Moreover, we show that the semi-supervised method produces a mild but significant improvement over the base-line. We also discuss the significance of the potential improvement of the semi-supervised methodology and found that it is significantly more accurate in most cases than the underlying base-line model.
Assuntos
Mídias Sociais , Algoritmos , Humanos , Idioma , Aprendizado de Máquina SupervisionadoRESUMO
BACKGROUND: Previous state-of-the-art systems on Drug Name Recognition (DNR) and Clinical Concept Extraction (CCE) have focused on a combination of text "feature engineering" and conventional machine learning algorithms such as conditional random fields and support vector machines. However, developing good features is inherently heavily time-consuming. Conversely, more modern machine learning approaches such as recurrent neural networks (RNNs) have proved capable of automatically learning effective features from either random assignments or automated word "embeddings". OBJECTIVES: (i) To create a highly accurate DNR and CCE system that avoids conventional, time-consuming feature engineering. (ii) To create richer, more specialized word embeddings by using health domain datasets such as MIMIC-III. (iii) To evaluate our systems over three contemporary datasets. METHODS: Two deep learning methods, namely the Bidirectional LSTM and the Bidirectional LSTM-CRF, are evaluated. A CRF model is set as the baseline to compare the deep learning systems to a traditional machine learning approach. The same features are used for all the models. RESULTS: We have obtained the best results with the Bidirectional LSTM-CRF model, which has outperformed all previously proposed systems. The specialized embeddings have helped to cover unusual words in DrugBank and MedLine, but not in the i2b2/VA dataset. CONCLUSIONS: We present a state-of-the-art system for DNR and CCE. Automated word embeddings has allowed us to avoid costly feature engineering and achieve higher accuracy. Nevertheless, the embeddings need to be retrained over datasets that are adequate for the domain, in order to adequately cover the domain-specific vocabulary.
Assuntos
Bases de Dados Factuais , Redes Neurais de Computação , Algoritmos , Humanos , Aprendizado de MáquinaRESUMO
Machine learning methods usually assume that training data and test data are drawn from the same distribution. However, this assumption often cannot be satisfied in the task of clinical concept extraction. The main aim of this paper was to use training data from one institution to build a concept extraction model for data from another institution with a different distribution. An instance-based transfer learning method, TrAdaBoost, was applied in this work. To prevent the occurrence of a negative transfer phenomenon with TrAdaBoost, we integrated it with Bagging, which provides a "softer" weights update mechanism with only a tiny amount of training data from the target domain. Two data sets named BETH and PARTNERS from the 2010 i2b2/VA challenge as well as BETHBIO, a data set we constructed ourselves, were employed to show the effectiveness of our work's transfer ability. Our method outperforms the baseline model by 2.3% and 4.4% when the baseline model is trained by training data that are combined from the source domain and the target domain in two experiments of BETH vs. PARTNERS and BETHBIO vs. PARTNERS, respectively. Additionally, confidence intervals for the performance metrics suggest that our method's results have statistical significance. Moreover, we explore the applicability of our method for further experiments. With our method, only a tiny amount of labeled data from the target domain is required to build a concept extraction model that produces better performance.
Assuntos
Inteligência Artificial , Mineração de Dados/métodos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Informática Médica , Vocabulário ControladoRESUMO
OBJECTIVE: To explore the feasibility of validating Dutch concept extraction tools using annotated corpora translated from English, focusing on preserving annotations during translation and addressing the scarcity of non-English annotated clinical corpora. MATERIALS AND METHODS: Three annotated corpora were standardized and translated from English to Dutch using 2 machine translation services, Google Translate and OpenAI GPT-4, with annotations preserved through a proposed method of embedding annotations in the text before translation. The performance of 2 concept extraction tools, MedSpaCy and MedCAT, was assessed across the corpora in both Dutch and English. RESULTS: The translation process effectively generated Dutch annotated corpora and the concept extraction tools performed similarly in both English and Dutch. Although there were some differences in how annotations were preserved across translations, these did not affect extraction accuracy. Supervised MedCAT models consistently outperformed unsupervised models, whereas MedSpaCy demonstrated high recall but lower precision. DISCUSSION: Our validation of Dutch concept extraction tools on corpora translated from English was successful, highlighting the efficacy of our annotation preservation method and the potential for efficiently creating multilingual corpora. Further improvements and comparisons of annotation preservation techniques and strategies for corpus synthesis could lead to more efficient development of multilingual corpora and accurate non-English concept extraction tools. CONCLUSION: This study has demonstrated that translated English corpora can be used to validate non-English concept extraction tools. The annotation preservation method used during translation proved effective, and future research can apply this corpus translation method to additional languages and clinical settings.
Assuntos
Tradução , Países Baixos , Processamento de Linguagem Natural , Humanos , Idioma , Mineração de Dados/métodosRESUMO
Notes documented by clinicians, such as patient histories, hospital courses, lab reports and others are often annotated with standardized clinical codes by medical coders to facilitate a variety of secondary processing applications such as billing and statistical analyses. Clinical coding, traditionally manual and labor-intensive, has seen a surge in research interest by deep learning researchers pursuing to automate it. However, deep learning methods require large volumes of annotated clinical data for training and offer little to explain why codes were assigned to pieces of text. In this paper, we propose an unsupervised method which does not need annotated clinical text and is fully interpretable, by using Named Entity and Attribute Recognition and word embeddings specialized for the clinical domain. These methods successfully glean important information from large volumes of clinical notes and encode them effectively in order to perform automatic clinical coding.
Assuntos
Codificação Clínica , Processamento de Linguagem Natural , HumanosRESUMO
OBJECTIVE: To develop a natural language processing system that solves both clinical concept extraction and relation extraction in a unified prompt-based machine reading comprehension (MRC) architecture with good generalizability for cross-institution applications. METHODS: We formulate both clinical concept extraction and relation extraction using a unified prompt-based MRC architecture and explore state-of-the-art transformer models. We compare our MRC models with existing deep learning models for concept extraction and end-to-end relation extraction using 2 benchmark datasets developed by the 2018 National NLP Clinical Challenges (n2c2) challenge (medications and adverse drug events) and the 2022 n2c2 challenge (relations of social determinants of health [SDoH]). We also evaluate the transfer learning ability of the proposed MRC models in a cross-institution setting. We perform error analyses and examine how different prompting strategies affect the performance of MRC models. RESULTS AND CONCLUSION: The proposed MRC models achieve state-of-the-art performance for clinical concept and relation extraction on the 2 benchmark datasets, outperforming previous non-MRC transformer models. GatorTron-MRC achieves the best strict and lenient F1-scores for concept extraction, outperforming previous deep learning models on the 2 datasets by 1%-3% and 0.7%-1.3%, respectively. For end-to-end relation extraction, GatorTron-MRC and BERT-MIMIC-MRC achieve the best F1-scores, outperforming previous deep learning models by 0.9%-2.4% and 10%-11%, respectively. For cross-institution evaluation, GatorTron-MRC outperforms traditional GatorTron by 6.4% and 16% for the 2 datasets, respectively. The proposed method is better at handling nested/overlapped concepts, extracting relations, and has good portability for cross-institute applications. Our clinical MRC package is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerMRC.
Assuntos
Compreensão , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Processamento de Linguagem NaturalRESUMO
The extraction of patient signs and symptoms recorded as free text in electronic health records is critical for precision medicine. Once extracted, signs and symptoms can be made computable by mapping to signs and symptoms in an ontology. Extracting signs and symptoms from free text is tedious and time-consuming. Prior studies have suggested that inter-rater agreement for clinical concept extraction is low. We have examined inter-rater agreement for annotating neurologic concepts in clinical notes from electronic health records. After training on the annotation process, the annotation tool, and the supporting neuro-ontology, three raters annotated 15 clinical notes in three rounds. Inter-rater agreement between the three annotators was high for text span and category label. A machine annotator based on a convolutional neural network had a high level of agreement with the human annotators but one that was lower than human inter-rater agreement. We conclude that high levels of agreement between human annotators are possible with appropriate training and annotation tools. Furthermore, more training examples combined with improvements in neural networks and natural language processing should make machine annotators capable of high throughput automated clinical concept extraction with high levels of agreement with human annotators.
RESUMO
Objective: Healthcare data such as clinical notes are primarily recorded in an unstructured manner. If adequately translated into structured data, they can be utilized for health economics and set the groundwork for better individualized patient care. To structure clinical notes, deep-learning methods, particularly transformer-based models like Bidirectional Encoder Representations from Transformers (BERT), have recently received much attention. Currently, biomedical applications are primarily focused on the English language. While general-purpose German-language models such as GermanBERT and GottBERT have been published, adaptations for biomedical data are unavailable. This study evaluated the suitability of existing and novel transformer-based models for the German biomedical and clinical domain. Materials and Methods: We used 8 transformer-based models and pre-trained 3 new models on a newly generated biomedical corpus, and systematically compared them with each other. We annotated a new dataset of clinical notes and used it with 4 other corpora (BRONCO150, CLEF eHealth 2019 Task 1, GGPONC, and JSynCC) to perform named entity recognition (NER) and document classification tasks. Results: General-purpose language models can be used effectively for biomedical and clinical natural language processing (NLP) tasks, still, our newly trained BioGottBERT model outperformed GottBERT on both clinical NER tasks. However, training new biomedical models from scratch proved ineffective. Discussion: The domain-adaptation strategy's potential is currently limited due to a lack of pre-training data. Since general-purpose language models are only marginally inferior to domain-specific models, both options are suitable for developing German-language biomedical applications. Conclusion: General-purpose language models perform remarkably well on biomedical and clinical NLP tasks. If larger corpora become available in the future, domain-adapting these models may improve performances.
RESUMO
Colonoscopy is a screening and diagnostic procedure for detection of colorectal carcinomas with specific quality metrics that monitor and improve adenoma detection rates. These quality metrics are stored in disparate documents i.e., colonoscopy, pathology, and radiology reports. The lack of integrated standardized documentation is impeding colorectal cancer research. Clinical concept extraction using Natural Language Processing (NLP) and Machine Learning (ML) techniques is an alternative to manual data abstraction. Contextual word embedding models such as BERT (Bidirectional Encoder Representations from Transformers) and FLAIR have enhanced performance of NLP tasks. Combining multiple clinically-trained embeddings can improve word representations and boost the performance of the clinical NLP systems. The objective of this study is to extract comprehensive clinical concepts from the consolidated colonoscopy documents using concatenated clinical embeddings. We built high-quality annotated corpora for three report types. BERT and FLAIR embeddings were trained on unlabeled colonoscopy related documents. We built a hybrid Artificial Neural Network (h-ANN) to concatenate and fine-tune BERT and FLAIR embeddings. To extract concepts of interest from three report types, 3 models were initialized from the h-ANN and fine-tuned using the annotated corpora. The models achieved best F1-scores of 91.76%, 92.25%, and 88.55% for colonoscopy, pathology, and radiology reports respectively.
RESUMO
Objective: Microbiology culture reports contain critical information for important clinical and public health applications. However, microbiology reports often have complex, semistructured, free-text data that present a barrier for secondary use. Here we present the development and validation of an open-source package designed to ingest free-text microbiology reports, determine whether the culture is positive, and return a list of Systemized Nomenclature of Medicine (SNOMED)-CT mapped bacteria. Materials and Methods: Our concept extraction Python package, MicrobEx, is built upon a rule-based natural language processing algorithm and was developed using microbiology reports from 2 different electronic health record systems in a large healthcare organization, and then externally validated on the reports of 2 other institutions with manually reviewed results as a benchmark. Results: MicrobEx achieved F1 scores >0.95 on all classification tasks across 2 independent validation sets with minimal customization. Additionally, MicrobEx matched or surpassed our MetaMap-based benchmark algorithm performance across positive culture classification and species capture classification tasks. Discussion: Our results suggest that MicrobEx can be used to reliably estimate binary bacterial culture status, extract bacterial species, and map these to SNOMED organism observations when applied to semistructured, free-text microbiology reports from different institutions with relatively low customization. Conclusion: MicrobEx offers an open-source software solution (available on both GitHub and PyPI) for bacterial culture status estimation and bacterial species extraction from free-text microbiology reports. The package was designed to be reused and adapted to individual institutions as an upstream process for other clinical applications such as: machine learning, clinical decision support, and disease surveillance systems.
RESUMO
Although deep learning has been applied to the recognition of diseases and drugs in electronic health records and the biomedical literature, relatively little study has been devoted to the utility of deep learning for the recognition of signs and symptoms. The recognition of signs and symptoms is critical to the success of deep phenotyping and precision medicine. We have developed a named entity recognition model that uses deep learning to identify text spans containing neurological signs and symptoms and then maps these text spans to the clinical concepts of a neuro-ontology. We compared a model based on convolutional neural networks to one based on bidirectional encoder representation from transformers. Models were evaluated for accuracy of text span identification on three text corpora: physician notes from an electronic health record, case histories from neurologic textbooks, and clinical synopses from an online database of genetic diseases. Both models performed best on the professionally-written clinical synopses and worst on the physician-written clinical notes. Both models performed better when signs and symptoms were represented as shorter text spans. Consistent with prior studies that examined the recognition of diseases and drugs, the model based on bidirectional encoder representations from transformers outperformed the model based on convolutional neural networks for recognizing signs and symptoms. Recall for signs and symptoms ranged from 59.5% to 82.0% and precision ranged from 61.7% to 80.4%. With further advances in NLP, fully automated recognition of signs and symptoms in electronic health records and the medical literature should be feasible.
RESUMO
Accurately recording a patient's medical conditions in an EHR system is the basis of effectively documenting patient health status, coding for billing, and supporting data-driven clinical decision making. However, patient conditions are often not fully captured in structured EHR systems, but may be documented in unstructured clinical notes. The challenge is that not all disease mentions in clinical notes actually refer to a patient's conditions. We developed a two-step workflow for identifying patient's conditions from clinical notes: disease mention extraction and disease mention classification. We implemented this workflow in a prototype system, DI++, for Disease Identification. An advanced deep learning model, CLSTM-Attention model, is developed for disease mention classification in DI++. Extensive empirical evaluation on about one million pages of de-identified clinical notes demonstrates that DI++ has significant performance advantage over existing systems on F1 Score, Area Under the Curve metrics, and efficiency. The proposed CLSTM-Attention model outperforms the existing deep learning models for disease mention classification.