Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
BMC Med Inform Decis Mak ; 23(1): 210, 2023 10 10.
Artigo em Inglês | MEDLINE | ID: mdl-37817193

RESUMO

BACKGROUND: Electronic medical records (EMRs) contain a wealth of information related to breast cancer diagnosis and treatment. Extracting relevant features from these medical records and constructing a knowledge graph can significantly contribute to an efficient data analysis and decision support system for breast cancer diagnosis. METHODS: An approach was proposed to develop a workflow for effectively extracting breast cancer-related features from Chinese breast cancer mammography reports and constructing a knowledge graph for breast cancer diagnosis. Firstly, the concept layer of the knowledge graph for breast cancer diagnosis was constructed based on breast cancer diagnosis and treatment guidelines, along with insights from clinical experts. .Next, a BiLSTM-Highway-CRF model was designed to extract the mammography features, which formed the data layer of the knowledge graph. Finally, the knowledge graph was constructed by combining the concept layer and the data layer in a Neo4j graph data platform, and then applied in visualization analysis, semantic query and computer assisted diagnosis. RESULTS: Mammographic features were extracted from a total of 1171 mammography examination reports. The overall extraction performance of the model achieved an accuracy rate of 97.16%, a recall rate of 98.06%, and a F1 score of 97.61%. Additionally, 47,660 relationships between entities were identified based on the four different types of relationships defined in the concept layer. The knowledge graph for breast cancer diagnosis was constructed after inputting mammographic features and relationships into the Neo4j graph data platform. The model was assessed from the concept layer, data layer, and application layer perspectives, and showed promising results. CONCLUSIONS: The proposed workflow is applicable for constructing knowledge graphs for breast cancer diagnosis based on Chinese EMRs. This study serves as a reference for the rapid design, construction, and application of knowledge graphs for diagnosis and treatment of other diseases. Furthermore, it offers a potential solution to address the issues of limited data sharing and format inconsistencies present in Chinese EMR data.


Assuntos
Neoplasias da Mama , Registros Eletrônicos de Saúde , Feminino , Humanos , Neoplasias da Mama/diagnóstico por imagem , População do Leste Asiático , Reconhecimento Automatizado de Padrão , Semântica , Armazenamento e Recuperação da Informação , Simulação por Computador , Visualização de Dados
2.
BMC Med Inform Decis Mak ; 22(1): 303, 2022 11 21.
Artigo em Inglês | MEDLINE | ID: mdl-36411432

RESUMO

BACKGROUND: With the development of current medical technology, information management becomes perfect in the medical field. Medical big data analysis is based on a large amount of medical and health data stored in the electronic medical system, such as electronic medical records and medical reports. How to fully exploit the resources of information included in these medical data has always been the subject of research by many scholars. The basis for text mining is named entity recognition (NER), which has its particularities in the medical field, where issues such as inadequate text resources and a large number of professional domain terms continue to face significant challenges in medical NER. METHODS: We improved the convolutional neural network model (imConvNet) to obtain additional text features. Concurrently, we continue to use the classical Bert pre-training model and BiLSTM model for named entity recognition. We use imConvNet model to extract additional word vector features and improve named entity recognition accuracy. The proposed model, named BERT-imConvNet-BiLSTM-CRF, is composed of four layers: BERT embedding layer-getting word embedding vector; imConvNet layer-capturing the context feature of each character; BiLSTM (Bidirectional Long Short-Term Memory) layer-capturing the long-distance dependencies; CRF (Conditional Random Field) layer-labeling characters based on their features and transfer rules. RESULTS: The average F1 score on the public medical data set yidu-s4k reached 91.38% when combined with the classical model; when real electronic medical record text in impacted wisdom teeth is used as the experimental object, the model's F1 score is 93.89%. They all show better results than classical models. CONCLUSIONS: The suggested novel model (imConvNet) significantly improves the recognition accuracy of Chinese medical named entities and applies to various medical corpora.


Assuntos
Aprendizado Profundo , Nomes , Humanos , Idioma , Mineração de Dados , China
3.
BMC Med Inform Decis Mak ; 22(1): 72, 2022 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-35321705

RESUMO

OBJECTIVE: Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient's physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain abundant diagnosis and treatment information. However, this information has not been well utilized because of the challenge to extract information from unstructured clinical texts. This study aims to enable machines to intelligently process clinical information, and automatically extract clinical named entity for pituitary adenomas from Chinese EMRs. METHODS: The clinical corpus used in this study was from one pituitary adenomas neurosurgery treatment center of a 3A hospital in China. Four types of fine-grained texts of clinical records were selected, which included notes from present illness, past medical history, case characteristics and family history of 500 pituitary adenoma inpatients. The dictionary-based matching, conditional random fields (CRF), bidirectional long short-term memory with CRF (BiLSTM-CRF), and bidirectional encoder representations from transformers with BiLSTM-CRF (BERT-BiLSTM-CRF) were used to extract clinical entities from a Chinese EMRs corpus. A comprehensive dictionary was constructed based on open source vocabularies and a domain dictionary for pituitary adenomas to conduct the dictionary-based matching method. We selected features such as part of speech, radical, document type, and the position of characters to train the CRF-based model. Random character embeddings and the character embeddings pretrained by BERT were used respectively as the input features for the BiLSTM-CRF model and the BERT-BiLSTM-CRF model. Both strict metric and relaxed metric were used to evaluate the performance of these methods. RESULTS: Experimental results demonstrated that the deep learning and other machine learning methods were able to automatically extract clinical named entities, including symptoms, body regions, diseases, family histories, surgeries, medications, and disease courses of pituitary adenomas from Chinese EMRs. With regard to overall performance, BERT-BiLSTM-CRF has the highest strict F1 value of 91.27% and the highest relaxed F1 value of 95.57% respectively. Additional evaluations showed that BERT-BiLSTM-CRF performed best in almost all entity recognition except surgery and disease course. BiLSTM-CRF performed best in disease course entity recognition, and performed as well as the CRF model for part of speech, radical and document type features, with both strict and relaxed F1 value reaching 96.48%. The CRF model with part of speech, radical and document type features performed best in surgery entity recognition with relaxed F1 value of 95.29%. CONCLUSIONS: In this study, we conducted four entity recognition methods for pituitary adenomas based on Chinese EMRs. It demonstrates that the deep learning methods can effectively extract various types of clinical entities with satisfying performance. This study contributed to the clinical named entity extraction from Chinese neurosurgical EMRs. The findings could also assist in information extraction in other Chinese medical texts.


Assuntos
Registros Eletrônicos de Saúde , Neoplasias Hipofisárias , Humanos , Armazenamento e Recuperação da Informação , Idioma , Processamento de Linguagem Natural , Neoplasias Hipofisárias/diagnóstico
4.
BMC Bioinformatics ; 20(1): 62, 2019 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-30709336

RESUMO

BACKGROUND: Benefiting from big data, powerful computation and new algorithmic techniques, we have been witnessing the renaissance of deep learning, particularly the combination of natural language processing (NLP) and deep neural networks. The advent of electronic medical records (EMRs) has not only changed the format of medical records but also helped users to obtain information faster. However, there are many challenges regarding researching directly using Chinese EMRs, such as low quality, huge quantity, imbalance, semi-structure and non-structure, particularly the high density of the Chinese language compared with English. Therefore, effective word segmentation, word representation and model architecture are the core technologies in the literature on Chinese EMRs. RESULTS: In this paper, we propose a deep learning framework to study intelligent diagnosis using Chinese EMR data, which incorporates a convolutional neural network (CNN) into an EMR classification application. The novelty of this paper is reflected in the following: (1) We construct a pediatric medical dictionary based on Chinese EMRs. (2) Word2vec adopted in word embedding is used to achieve the semantic description of the content of Chinese EMRs. (3) A fine-tuning CNN model is constructed to feed the pediatric diagnosis with Chinese EMR data. Our results on real-world pediatric Chinese EMRs demonstrate that the average accuracy and F1-score of the CNN models are up to 81%, which indicates the effectiveness of the CNN model for the classification of EMRs. Particularly, a fine-tuning one-layer CNN performs best among all CNNs, recurrent neural network (RNN) (long short-term memory, gated recurrent unit) and CNN-RNN models, and the average accuracy and F1-score are both up to 83%. CONCLUSION: The CNN framework that includes word segmentation, word embedding and model training can serve as an intelligent auxiliary diagnosis tool for pediatricians. Particularly, a fine-tuning one-layer CNN performs well, which indicates that word order does not appear to have a useful effect on our Chinese EMRs.


Assuntos
Registros Eletrônicos de Saúde , Idioma , Redes Neurais de Computação , Dicionários como Assunto , Humanos , Processamento de Linguagem Natural , Semântica , Vocabulário
5.
BMC Med Inform Decis Mak ; 19(Suppl 5): 235, 2019 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-31801540

RESUMO

BACKGROUND: Clinical named entity recognition (CNER) is important for medical information mining and establishment of high-quality knowledge map. Due to the different text features from natural language and a large number of professional and uncommon clinical terms in Chinese electronic medical records (EMRs), there are still many difficulties in clinical named entity recognition of Chinese EMRs. It is of great importance to eliminate semantic interference and improve the ability of autonomous learning of internal features of the model under the small training corpus. METHODS: From the perspective of deep learning, we integrated the attention mechanism into neural network, and proposed an improved clinical named entity recognition method for Chinese electronic medical records called BiLSTM-Att-CRF, which could capture more useful information of the context and avoid the problem of missing information caused by long-distance factors. In addition, medical dictionaries and part-of-speech (POS) features were also introduced to improve the performance of the model. RESULTS: Based on China Conference on Knowledge Graph and Semantic Computing (CCKS) 2017 and 2018 Chinese EMRs corpus, our BiLSTM-Att-CRF model finally achieved better performance than other widely-used models without additional features(F1-measure of 85.4% in CCKS 2018, F1-measure of 90.29% in CCKS 2017), and achieved the best performance with POS and dictionary features (F1-measure of 86.11% in CCKS 2018, F1-measure of 90.48% in CCKS 2017). In particular, the BiLSTM-Att-CRF model had significant effect on the improvement of Recall. CONCLUSIONS: Our work preliminarily confirmed the validity of attention mechanism in discovering key information and mining text features, which might provide useful ideas for future research in clinical named entity recognition of Chinese electronic medical records. In the future, we will explore the deeper application of attention mechanism in neural network.


Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde , Atenção , China , Mineração de Dados , Humanos , Idioma , Processamento de Linguagem Natural , Redes Neurais de Computação , Semântica
6.
BMC Med Inform Decis Mak ; 19(Suppl 2): 65, 2019 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-30961622

RESUMO

BACKGROUND: The Named Entity Recognition (NER) task as a key step in the extraction of health information, has encountered many challenges in Chinese Electronic Medical Records (EMRs). Firstly, the casual use of Chinese abbreviations and doctors' personal style may result in multiple expressions of the same entity, and we lack a common Chinese medical dictionary to perform accurate entity extraction. Secondly, the electronic medical record contains entities from a variety of categories of entities, and the length of those entities in different categories varies greatly, which increases the difficult in the extraction for the Chinese NER. Therefore, the entity boundary detection becomes the key to perform accurate entity extraction of Chinese EMRs, and we need to develop a model that supports multiple length entity recognition without relying on any medical dictionary. METHODS: In this study, we incorporate part-of-speech (POS) information into the deep learning model to improve the accuracy of Chinese entity boundary detection. In order to avoid the wrongly POS tagging of long entities, we proposed a method called reduced POS tagging that reserves the tags of general words but not of the seemingly medical entities. The model proposed in this paper, named SM-LSTM-CRF, consists of three layers: self-matching attention layer - calculating the relevance of each character to the entire sentence; LSTM (Long Short-Term Memory) layer - capturing the context feature of each character; CRF (Conditional Random Field) layer - labeling characters based on their features and transfer rules. RESULTS: The experimental results at a Chinese EMRs dataset show that the F1 value of SM-LSTM-CRF is increased by 2.59% compared to that of the LSTM-CRF. After adding POS feature in the model, we get an improvement of about 7.74% at F1. The reduced POS tagging reduces the false tagging on long entities, thus increases the F1 value by 2.42% and achieves an F1 score of 80.07%. CONCLUSIONS: The POS feature marked by the reduced POS tagging together with self-matching attention mechanism puts a stranglehold on entity boundaries and has a good performance in the recognition of clinical entities.


Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Atenção , China , Humanos , Idioma , Fala
7.
BMC Med Inform Decis Mak ; 17(1): 117, 2017 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-28789686

RESUMO

BACKGROUND: Cardiovascular disease (CVD) has become the leading cause of death in China, and most of the cases can be prevented by controlling risk factors. The goal of this study was to build a corpus of CVD risk factor annotations based on Chinese electronic medical records (CEMRs). This corpus is intended to be used to develop a risk factor information extraction system that, in turn, can be applied as a foundation for the further study of the progress of risk factors and CVD. RESULTS: We designed a light annotation task to capture CVD risk factors with indicators, temporal attributes and assertions that were explicitly or implicitly displayed in the records. The task included: 1) preparing data; 2) creating guidelines for capturing annotations (these were created with the help of clinicians); 3) proposing an annotation method including building the guidelines draft, training the annotators and updating the guidelines, and corpus construction. Meanwhile, we proposed some creative annotation guidelines: (1) the under-threshold medical examination values were annotated for our purpose of studying the progress of risk factors and CVD; (2) possible and negative risk factors were concerned for the same reason, and we created assertions for annotations; (3) we added four temporal attributes to CVD risk factors in CEMRs for constructing long term variations. Then, a risk factor annotated corpus based on de-identified discharge summaries and progress notes from 600 patients was developed. Built with the help of clinicians, this corpus has an inter-annotator agreement (IAA) F1-measure of 0.968, indicating a high reliability. CONCLUSION: To the best of our knowledge, this is the first annotated corpus concerning CVD risk factors in CEMRs and the guidelines for capturing CVD risk factor annotations from CEMRs were proposed. The obtained document-level annotations can be applied in future studies to monitor risk factors and CVD over the long term.


Assuntos
Doenças Cardiovasculares , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , China , Humanos , Fatores de Risco
8.
Math Biosci Eng ; 21(1): 1342-1355, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38303468

RESUMO

Extracting entity relations from unstructured Chinese electronic medical records is an important task in medical information extraction. However, Chinese electronic medical records mostly have document-level volumes, and existing models are either unable to handle long text sequences or exhibit poor performance. This paper proposes a neural network based on feature augmentation and cascade binary tagging framework. First, we utilize a pre-trained model to tokenize the original text and obtain word embedding vectors. Second, the word vectors are fed into the feature augmentation network and fused with the original features and position features. Finally, the cascade binary tagging decoder generates the results. In the current work, we built a Chinese document-level electronic medical record dataset named VSCMeD, which contains 595 real electronic medical records from vascular surgery patients. The experimental results show that the model achieves a precision of 87.82% and recall of 88.47%. It is also verified on another Chinese medical dataset CMeIE-V2 that the model achieves a precision of 54.51% and recall of 48.63%.


Assuntos
Registros Eletrônicos de Saúde , Redes Neurais de Computação , Humanos , Armazenamento e Recuperação da Informação , China
9.
Front Med (Lausanne) ; 11: 1272224, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38784240

RESUMO

Background: Venous thromboembolism (VTE) is characterized by high morbidity, mortality, and complex treatment. A VTE knowledge graph (VTEKG) can effectively integrate VTE-related medical knowledge and offer an intuitive description and analysis of the relations between medical entities. However, current methods for constructing knowledge graphs typically suffer from error propagation and redundant information. Methods: In this study, we propose a deep learning-based joint extraction model, Biaffine Common-Sequence Self-Attention Linker (BCSLinker), for Chinese electronic medical records to address the issues mentioned above, which often occur when constructing a VTEKG. First, the Biaffine Common-Sequence Self-Attention (BCsSa) module is employed to create global matrices and extract entities and relations simultaneously, mitigating error propagation. Second, the multi-label cross-entropy loss is utilized to diminish the impact of redundant information and enhance information extraction. Results: We used the electronic medical record data of VTE patients from a tertiary hospital, achieving an F1 score of 86.9% on BCSLinker. It outperforms the other joint entity and relation extraction models discussed in this study. In addition, we developed a question-answering system based on the VTEKG as a structured data source. Conclusion: This study has constructed a more accurate and comprehensive VTEKG that can provide reference for diagnosing, evaluating, and treating VTE as well as supporting patient self-care, which is of considerable clinical value.

10.
JMIR Med Inform ; 8(10): e18287, 2020 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-33026359

RESUMO

BACKGROUND: With the increasing incidences and mortality of digestive system tumor diseases in China, ways to use clinical experience data in Chinese electronic medical records (CEMRs) to determine potentially effective relationships between diagnosis and treatment have become a priority. As an important part of artificial intelligence, a knowledge graph is a powerful tool for information processing and knowledge organization that provides an ideal means to solve this problem. OBJECTIVE: This study aimed to construct a semantic-driven digestive system tumor knowledge graph (DSTKG) to represent the knowledge in CEMRs with fine granularity and semantics. METHODS: This paper focuses on the knowledge graph schema and semantic relationships that were the main challenges for constructing a Chinese tumor knowledge graph. The DSTKG was developed through a multistep procedure. As an initial step, a complete DSTKG construction framework based on CEMRs was proposed. Then, this research built a knowledge graph schema containing 7 classes and 16 kinds of semantic relationships and accomplished the DSTKG by knowledge extraction, named entity linking, and drawing the knowledge graph. Finally, the quality of the DSTKG was evaluated from 3 aspects: data layer, schema layer, and application layer. RESULTS: Experts agreed that the DSTKG was good overall (mean score 4.20). Especially for the aspects of "rationality of schema structure," "scalability," and "readability of results," the DSTKG performed well, with scores of 4.72, 4.67, and 4.69, respectively, which were much higher than the average. However, the small amount of data in the DSTKG negatively affected its "practicability" score. Compared with other Chinese tumor knowledge graphs, the DSTKG can represent more granular entities, properties, and semantic relationships. In addition, the DSTKG was flexible, allowing personalized customization to meet the designer's focus on specific interests in the digestive system tumor. CONCLUSIONS: We constructed a granular semantic DSTKG. It could provide guidance for the construction of a tumor knowledge graph and provide a preliminary step for the intelligent application of knowledge graphs based on CEMRs. Additional data sources and stronger research on assertion classification are needed to gain insight into the DSTKG's potential.

11.
Math Biosci Eng ; 17(4): 3498-3511, 2020 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-32987540

RESUMO

The combination of medical field and big data has led to an explosive growth in the volume of electronic medical records (EMRs), in which the information contained has guiding significance for diagnosis. And how to extract these information from EMRs has become a hot research topic. In this paper, we propose an ELMo-ET-CRF model based approach to extract medical named entity from Chinese electronic medical records (CEMRs). Firstly, a domain-specific ELMo model is fine-tuned on a common ELMo model with 4679 raw CEMRs. Then we use the encoder from Transformer (ET) as our model's encoder to alleviate the long context dependency problem, and the CRF is utilized as the decoder. At last, we compare the BiLSTM-CRF and ET-CRF model with word2vec and ELMo embeddings to CEMRs respectively to validate the effectiveness of ELMo-ET-CRF model. With the same training data and test data, the ELMo-ET-CRF outperforms all the other mentioned model architectures in this paper with 85.59% F1-score, which indicates the effectiveness of the proposed model architecture, and the performance is also competitive on the CCKS2019 leaderboard.


Assuntos
Registros Eletrônicos de Saúde , Idioma , China
12.
Comput Methods Programs Biomed ; 172: 1-10, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30902121

RESUMO

BACKGROUND AND OBJECTIVE: Early prevention of cardiovascular diseases (CVDs) can effectively prevent later loss of health, and the detection of CVDs risk factors is a simple method to achieve early prevention. Personal health records play a prominent role in the field of health information extraction because of their factuality and reliability. This present study describes how to extract risk factors for CVDs from Chinese electronic medical records (CEMRs). METHODS: The extraction process involves two tasks: (a) CVDs risk factor recognition and (b) risk factor time and assertion classification. We considered risk factor recognition as a named entity recognition (NER) task and time and assertion classification as a textual classification task. An information extraction pipeline system consisting of NER and textual classification modules with machine learning models was developed. In the risk factor recognition module, bidirectional long short term memory (BLSTM) with extra risk factor textual feature input was built, as well, convolutional neural networks (CNNs) with risk factor type and section label input and support vector machine (SVM) were built for time and assertion classification. RESULTS: We have achieved the best performance of risk factor recognition with F1 value of 0.9609, time and assertion classification with F1 of 0.9812 and 0.9612, respectively. The experimental results showed that our system achieved a high performance and can extract risk factors from CEMRs efficiently. CONCLUSIONS: The proposed system is the first system for CVDs risk factors extraction from CEMRs and shows competition to risk factor extraction systems that developed on English EMRs. Further, its good performance should have a strong influence on CVDs prevention.


Assuntos
Doenças Cardiovasculares/etiologia , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , China , Humanos , Aprendizado de Máquina , Fatores de Risco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA