Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 70
Filtrar
1.
J Biomed Inform ; 150: 104599, 2024 02.
Artigo em Inglês | MEDLINE | ID: mdl-38272433

RESUMO

OBJECTIVE: Event extraction plays a crucial role in natural language processing. However, in the biomedical domain, the presence of nested events adds complexity to event extraction compared to single events, and these events usually have strong semantic relationships and constraints. Previous approaches ignored the binding connections between these complex nested events. This study aims to develop a unified framework based on event constraint information that jointly extract biomedical event triggers and arguments and enhance the performance of nested biomedical event extraction. MATERIAL AND METHODS: We propose a multi-task learning framework based on constraint information called CMBEE for the task of biomedical event extraction. The N-tuple form of event patterns is used to represent the constrained information, which is integrated into role detection and event type classification tasks. The framework use attention mechanism and gating mechanism to explore the fusion of multiple tuple information, as well as local and global constrained information fusion methods to dig further into the connections between events. RESULTS: Experimental results demonstrate that our proposed method achieves the highest F1 score on a multilevel event extraction biomedical (MLEE) corpus and performs favorably on the biomedical natural language processing shared task 2013 Genia event corpus (GE 13). CONCLUSIONS: The experimental results indicate that modeling event patterns and constraints for multi-event extraction tasks is effective for complex biomedical event extraction. The fusion strategy proposed in this study, which incorporates different constraint information, helps to better express semantic information.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural , Semântica , Mineração de Dados/métodos
2.
BMC Bioinformatics ; 23(1): 20, 2022 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-34991458

RESUMO

BACKGROUND: In biomedical research, chemical and disease relation extraction from unstructured biomedical literature is an essential task. Effective context understanding and knowledge integration are two main research problems in this task. Most work of relation extraction focuses on classification for entity mention pairs. Inspired by the effectiveness of machine reading comprehension (RC) in the respect of context understanding, solving biomedical relation extraction with the RC framework at both intra-sentential and inter-sentential levels is a new topic worthy to be explored. Except for the unstructured biomedical text, many structured knowledge bases (KBs) provide valuable guidance for biomedical relation extraction. Utilizing knowledge in the RC framework is also worthy to be investigated. We propose a knowledge-enhanced reading comprehension (KRC) framework to leverage reading comprehension and prior knowledge for biomedical relation extraction. First, we generate questions for each relation, which reformulates the relation extraction task to a question answering task. Second, based on the RC framework, we integrate knowledge representation through an efficient knowledge-enhanced attention interaction mechanism to guide the biomedical relation extraction. RESULTS: The proposed model was evaluated on the BioCreative V CDR dataset and CHR dataset. Experiments show that our model achieved a competitive document-level F1 of 71.18% and 93.3%, respectively, compared with other methods. CONCLUSION: Result analysis reveals that open-domain reading comprehension data and knowledge representation can help improve biomedical relation extraction in our proposed KRC framework. Our work can encourage more research on bridging reading comprehension and biomedical relation extraction and promote the biomedical relation extraction.


Assuntos
Pesquisa Biomédica , Compreensão , Bases de Conhecimento , Idioma
3.
J Biomed Inform ; 127: 104011, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35176451

RESUMO

Automatic medical event prediction (MEP), e.g. diagnosis prediction, medication prediction, using electronic health records (EHRs) is a popular research direction in health informatics. In many cases, MEP relies on the determinations from different types of medical events, which demonstrates the heterogeneous nature of EHRs. However, most existing methods for MEP fail to distinguishingly model the type of event that is highly associated with the prediction task, i.e. task-wise event, which usually plays a more significant role than other events. In this paper, we proposed a Long Short-Term Memory network (LSTM)-based method for MEP, named Multi-Channel Fusion LSTM (MCF-LSTM), which models the correlations between different types of medical events using multiple network channels. To this end, we designed a task-wise fusion module, in which a gated network is applied to select how much information can be transferred between events. Furthermore, the irregular temporal interval between adjacent medical visits is also modeled in an individual channel, which is combined with other events in a unified manner. We compared MCF-LSTM with state-of-the-art methods on four MEP tasks on two public datasets: MIMIC-III and eICU. Experimental results show that MCF-LSTM achieves promising results on AUC(receiver operating characteristic curve), AUPR (area under the precision-recall curve), and top-k recall, and outperforms other methods with high stability.


Assuntos
Registros Eletrônicos de Saúde , Informática Médica , Redes Neurais de Computação , Curva ROC
4.
J Biomed Inform ; 136: 104238, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36400329

RESUMO

OBJECTIVE: Biomedical named entity normalization (BNEN) is a fundamental natural language processing (NLP) task in the biomedical domain. Many representation learning-based methods have been successfully applied to BNEN in recent years. Most of them encode a given biomedical named entity mention (BNEM) and candidates separately, some of them consider relations between the BNEM and its candidates, however, few consider relations among the candidates, which may be useful for BNEN. MATERIAL AND METHODS: In this paper, we propose a novel interaction-based synonym marginalization for BNEN, which can capture both the relations between a given mention and the mention's candidates and that among the candidates, called IA-BIOSYN. In IA-BIOSYN, given a BNEM, a candidate selector is used to obtain the candidates of the BNEM dynamically, then an interaction module is used to model BNEM-candidate relations as well as candidate-candidate relations, and finally a synonym marginalization module is used to determine which candidate(s) the BNEM should be mapped to. To validate the effectiveness of our proposed method, we compare it with other state-of-the-art (SOTA) methods on three public BNEN datasets: NCBI-Disease, BC5CDR-Disease and BC5CDR-Chemical. RESULTS: Our proposed method achieves Acc@1 of 0.9333, 0.9379 and 0.9693 on NCBI-Disease, BC5CDR-Disease and BC5CDR-Chemical, respectively, significantly better than other SOTA methods. CONCLUSIONS: Both the relations between a given BNEM and its candidates, and the relations among the candidates are useful for BNEN, and the proposed IA-BIOSYN can capture the two types of relations effectively.


Assuntos
Processamento de Linguagem Natural
5.
J Biomed Inform ; 128: 104035, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35217186

RESUMO

OBJECTIVE: External knowledge, such as lexicon of words in Chinese and domain knowledge graph (KG) of concepts, has been recently adopted to improve the performance of machine learning methods for named entity recognition (NER) as it can provide additional information beyond context. However, most existing studies only consider knowledge from one source (i.e., either lexicon or knowledge graph) in different ways and consider lexicon words or KG concepts independently with their boundaries. In this paper, we focus on leveraging multi-source knowledge in a unified manner where lexicon words or KG concepts are well combined with their boundaries for Chinese Clinical NER (CNER). MATERIAL AND METHODS: We propose a novel method based on relational graph convolutional network (RGCN), called MKRGCN, to utilize multi-source knowledge in a unified manner for CNER. For any sentence, a relational graph based on words or concepts in each knowledge source is constructed, where lexicon words or KG concepts appearing in the sentence are linked to the containing tokens with the boundary information of the lexicon words or KG concepts. RGCN is used to model all relational graphs constructed from multi-source knowledge, and the representations of tokens from multi-source knowledge are integrated into the context representations of tokens via an attention mechanism. Based on the knowledge-enhanced representations of tokens, we deploy a conditional random field (CRF) layer for named entity label prediction. In this study, a lexicon of words and a medical knowledge graph are used as knowledge sources for Chinese CNER. RESULTS: Our proposed method achieves the best performance on CCKS2017 and CCKS2018 in Chinese with F1-scores of 91.88% and 89.91%, respectively, significantly outperforming existing methods. The extended experiments on NCBI-Disease and BC2GM in English also prove the effectiveness of our method when only considering one knowledge source via RGCN. CONCLUSION: The MKRGCN model can integrate knowledge from the external lexicon and knowledge graph effectively for Chinese CNER and has the potential to be applied to English NER.


Assuntos
Idioma , Redes Neurais de Computação , China , Atenção à Saúde , Aprendizado de Máquina
6.
BMC Bioinformatics ; 22(Suppl 1): 600, 2021 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-34920699

RESUMO

BACKGROUND: Biomedical named entity recognition (NER) is a fundamental task of biomedical text mining that finds the boundaries of entity mentions in biomedical text and determines their entity type. To accelerate the development of biomedical NER techniques in Spanish, the PharmaCoNER organizers launched a competition to recognize pharmacological substances, compounds, and proteins. Biomedical NER is usually recognized as a sequence labeling task, and almost all state-of-the-art sequence labeling methods ignore the meaning of different entity types. In this paper, we investigate some methods to introduce the meaning of entity types in deep learning methods for biomedical NER and apply them to the PharmaCoNER 2019 challenge. The meaning of each entity type is represented by its definition information. MATERIAL AND METHOD: We investigate how to use entity definition information in the following two methods: (1) SQuad-style machine reading comprehension (MRC) methods that treat entity definition information as query and biomedical text as context and predict answer spans as entities. (2) Span-level one-pass (SOne) methods that predict entity spans of one type by one type and introduce entity type meaning, which is represented by entity definition information. All models are trained and tested on the PharmaCoNER 2019 corpus, and their performance is evaluated by strict micro-average precision, recall, and F1-score. RESULTS: Entity definition information brings improvements to both SQuad-style MRC and SOne methods by about 0.003 in micro-averaged F1-score. The SQuad-style MRC model using entity definition information as query achieves the best performance with a micro-averaged precision of 0.9225, a recall of 0.9050, and an F1-score of 0.9137, respectively. It outperforms the best model of the PharmaCoNER 2019 challenge by 0.0032 in F1-score. Compared with the state-of-the-art model without using manually-crafted features, our model obtains a 1% improvement in F1-score, which is significant. These results indicate that entity definition information is useful for deep learning methods on biomedical NER. CONCLUSION: Our entity definition information enhanced models achieve the state-of-the-art micro-average F1 score of 0.9137, which implies that entity definition information has a positive impact on biomedical NER detection. In the future, we will explore more entity definition information from knowledge graph.


Assuntos
Aprendizado Profundo
7.
J Med Internet Res ; 23(2): e24813, 2021 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-33599615

RESUMO

BACKGROUND: The adoption rate of electronic health records (EHRs) in hospitals has become a main index to measure digitalization in medicine in each country. OBJECTIVE: This study summarizes and shares the experiences with EHR adoption in China and in the United States. METHODS: Using the 2007-2018 annual hospital survey data from the Chinese Health Information Management Association (CHIMA) and the 2008-2017 United States American Hospital Association Information Technology Supplement survey data, we compared the trends in EHR adoption rates in China and the United States. We then used the Bass model to fit these data and to analyze the modes of diffusion of EHRs in these 2 countries. Finally, using the 2007, 2010, and 2014 CHIMA and Healthcare Information and Management Systems Services survey data, we analyzed the major challenges faced by hospitals in China and the United States in developing health information technology. RESULTS: From 2007 to 2018, the average adoption rates of the sampled hospitals in China increased from 18.6% to 85.3%, compared to the increase from 9.4% to 96% in US hospitals from 2008 to 2017. The annual average adoption rates in Chinese and US hospitals were 6.1% and 9.6%, respectively. However, the annual average number of hospitals adopting EHRs was 1500 in China and 534 in the US, indicating that the former might require more effort. Both countries faced similar major challenges for hospital digitalization. CONCLUSIONS: The adoption rates of hospital EHRs in China and the United States have both increased significantly in the past 10 years. The number of hospitals that adopted EHRs in China exceeded 16,000, which was 3.3 times that of the 4814 nonfederal US hospitals. This faster adoption outcome may have been a benefit of top-level design and government-led policies, particularly the inclusion of EHR adoption as an important indicator for performance evaluation and the appointment of public hospitals.


Assuntos
Análise de Dados , Registros Eletrônicos de Saúde/normas , China , Humanos , Inquéritos e Questionários , Fatores de Tempo , Estados Unidos
8.
BMC Med Inform Decis Mak ; 21(Suppl 9): 251, 2021 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-34789238

RESUMO

BACKGROUND: Drug repurposing is to find new indications of approved drugs, which is essential for investigating new uses for approved or investigational drug efficiency. The active gene annotation corpus (named AGAC) is annotated by human experts, which was developed to support knowledge discovery for drug repurposing. The AGAC track of the BioNLP Open Shared Tasks using this corpus is organized by EMNLP-BioNLP 2019, where the "Selective annotation" attribution makes AGAC track more challenging than other traditional sequence labeling tasks. In this work, we show our methods for trigger word detection (Task 1) and its thematic role identification (Task 2) in the AGAC track. As a step forward to drug repurposing research, our work can also be applied to large-scale automatic extraction of medical text knowledge. METHODS: To meet the challenges of the two tasks, we consider Task 1 as the medical name entity recognition (NER), which cultivates molecular phenomena related to gene mutation. And we regard Task 2 as a relation extraction task, which captures the thematic roles between entities. In this work, we exploit pre-trained biomedical language representation models (e.g., BioBERT) in the information extraction pipeline for mutation-disease knowledge collection from PubMed. Moreover, we design the fine-tuning framework by using a multi-task learning technique and extra features. We further investigate different approaches to consolidate and transfer the knowledge from varying sources and illustrate the performance of our model on the AGAC corpus. Our approach is based on fine-tuned BERT, BioBERT, NCBI BERT, and ClinicalBERT using multi-task learning. Further experiments show the effectiveness of knowledge transformation and the ensemble integration of models of two tasks. We conduct a performance comparison of various algorithms. We also do an ablation study on the development set of Task 1 to examine the effectiveness of each component of our method. RESULTS: Compared with competitor methods, our model obtained the highest Precision (0.63), Recall (0.56), and F-score value (0.60) in Task 1, which ranks first place. It outperformed the baseline method provided by the organizers by 0.10 in F-score. The model shared the same encoding layers for the named entity recognition and relation extraction parts. And we obtained a second high F-score (0.25) in Task 2 with a simple but effective framework. CONCLUSIONS: Experimental results on the benchmark annotation of genes with active mutation-centric function changes corpus show that integrating pre-trained biomedical language representation models (i.e., BERT, NCBI BERT, ClinicalBERT, BioBERT) into a pipe of information extraction methods with multi-task learning can improve the ability to collect mutation-disease knowledge from PubMed.


Assuntos
Processamento de Linguagem Natural , Preparações Farmacêuticas , Algoritmos , Humanos , Armazenamento e Recuperação da Informação , Descoberta do Conhecimento
9.
BMC Med Inform Decis Mak ; 21(Suppl 7): 368, 2021 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-34969377

RESUMO

OBJECTIVE: Relation extraction (RE) is a fundamental task of natural language processing, which always draws plenty of attention from researchers, especially RE at the document-level. We aim to explore an effective novel method for document-level medical relation extraction. METHODS: We propose a novel edge-oriented graph neural network based on document structure and external knowledge for document-level medical RE, called SKEoG. This network has the ability to take full advantage of document structure and external knowledge. RESULTS: We evaluate SKEoG on two public datasets, that is, Chemical-Disease Relation (CDR) dataset and Chemical Reactions dataset (CHR) dataset, by comparing it with other state-of-the-art methods. SKEoG achieves the highest F1-score of 70.7 on the CDR dataset and F1-score of 91.4 on the CHR dataset. CONCLUSION: The proposed SKEoG method achieves new state-of-the-art performance. Both document structure and external knowledge can bring performance improvement in the EoG framework. Selecting proper methods for knowledge node representation is also very important.


Assuntos
Processamento de Linguagem Natural , Redes Neurais de Computação , Humanos , Bases de Conhecimento , Projetos de Pesquisa
10.
BMC Med Inform Decis Mak ; 21(Suppl 2): 94, 2021 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-34330253

RESUMO

BACKGROUND: Text Matching (TM) is a fundamental task of natural language processing widely used in many application systems such as information retrieval, automatic question answering, machine translation, dialogue system, reading comprehension, etc. In recent years, a large number of deep learning neural networks have been applied to TM, and have refreshed benchmarks of TM repeatedly. Among the deep learning neural networks, convolutional neural network (CNN) is one of the most popular networks, which suffers from difficulties in dealing with small samples and keeping relative structures of features. In this paper, we propose a novel deep learning architecture based on capsule network for TM, called CapsTM, where capsule network is a new type of neural network architecture proposed to address some of the short comings of CNN and shows great potential in many tasks. METHODS: CapsTM is a five-layer neural network, including an input layer, a representation layer, an aggregation layer, a capsule layer and a prediction layer. In CapsTM, two pieces of text are first individually converted into sequences of embeddings and are further transformed by a highway network in the input layer. Then, Bidirectional Long Short-Term Memory (BiLSTM) is used to represent each piece of text and attention-based interaction matrix is used to represent interactive information of the two pieces of text in the representation layer. Subsequently, the two kinds of representations are fused together by BiLSTM in the aggregation layer, and are further represented with capsules (vectors) in the capsule layer. Finally, the prediction layer is a connected network used for classification. CapsTM is an extension of ESIM by adding a capsule layer before the prediction layer. RESULTS: We construct a corpus of Chinese medical question matching, which contains 36,360 question pairs. This corpus is randomly split into three parts: a training set of 32,360 question pairs, a development set of 2000 question pairs and a test set of 2000 question pairs. On this corpus, we conduct a series of experiments to evaluate the proposed CapsTM and compare it with other state-of-the-art methods. CapsTM achieves the highest F-score of 0.8666. CONCLUSION: The experimental results demonstrate that CapsTM is effective for Chinese medical question matching and outperforms other state-of-the-art methods for comparison.


Assuntos
Processamento de Linguagem Natural , Redes Neurais de Computação , China , Humanos , Armazenamento e Recuperação da Informação , Idioma
11.
BMC Med Inform Decis Mak ; 20(Suppl 1): 72, 2020 04 30.
Artigo em Inglês | MEDLINE | ID: mdl-32349764

RESUMO

BACKGROUND: Semantic textual similarity (STS) is a fundamental natural language processing (NLP) task which can be widely used in many NLP applications such as Question Answer (QA), Information Retrieval (IR), etc. It is a typical regression problem, and almost all STS systems either use distributed representation or one-hot representation to model sentence pairs. METHODS: In this paper, we proposed a novel framework based on a gated network to fuse distributed representation and one-hot representation of sentence pairs. Some current state-of-the-art distributed representation methods, including Convolutional Neural Network (CNN), Bi-directional Long Short Term Memory networks (Bi-LSTM) and Bidirectional Encoder Representations from Transformers (BERT), were used in our framework, and a system based on this framework was developed for a shared task regarding clinical STS organized by BioCreative/OHNLP in 2018. RESULTS: Compared with the systems only using distributed representation or one-hot representation, our method achieved much higher Pearson correlation. Among all distributed representations, BERT performed best. The highest Person correlation of our system was 0.8541, higher than the best official one of the BioCreative/OHNLP clinical STS shared task in 2018 (0.8328) by 0.0213. CONCLUSIONS: Distributed representation and one-hot representation are complementary to each other and can be fused by gated network.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Redes Neurais de Computação , Semântica , Humanos , Idioma
12.
Entropy (Basel) ; 22(10)2020 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-33286931

RESUMO

Time series prediction has been widely applied to the finance industry in applications such as stock market price and commodity price forecasting. Machine learning methods have been widely used in financial time series prediction in recent years. How to label financial time series data to determine the prediction accuracy of machine learning models and subsequently determine final investment returns is a hot topic. Existing labeling methods of financial time series mainly label data by comparing the current data with those of a short time period in the future. However, financial time series data are typically non-linear with obvious short-term randomness. Therefore, these labeling methods have not captured the continuous trend features of financial time series data, leading to a difference between their labeling results and real market trends. In this paper, a new labeling method called "continuous trend labeling" is proposed to address the above problem. In the feature preprocessing stage, this paper proposed a new method that can avoid the problem of look-ahead bias in traditional data standardization or normalization processes. Then, a detailed logical explanation was given, the definition of continuous trend labeling was proposed and also an automatic labeling algorithm was given to extract the continuous trend features of financial time series data. Experiments on the Shanghai Composite Index and Shenzhen Component Index and some stocks of China showed that our labeling method is a much better state-of-the-art labeling method in terms of classification accuracy and some other classification evaluation metrics. The results of the paper also proved that deep learning models such as LSTM and GRU are more suitable for dealing with the prediction of financial time series data.

13.
BMC Bioinformatics ; 20(1): 330, 2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31196129

RESUMO

BACKGROUND: Ontology has attracted substantial attention from both academia and industry. Handling uncertainty reasoning is important in researching ontology. For example, when a patient is suffering from cirrhosis, the appearance of abdominal vein varices is four times more likely than the presence of bitter taste. Such medical knowledge is crucial for decision-making in various medical applications but is missing from existing medical ontologies. In this paper, we aim to discover medical knowledge probabilities from electronic medical record (EMR) texts to enrich ontologies. First, we build an ontology by identifying meaningful entity mentions from EMRs. Then, we propose a symptom-dependency-aware naïve Bayes classifier (SDNB) that is based on the assumption that there is a level of dependency among symptoms. To ensure the accuracy of the diagnostic classification, we incorporate the probability of a disease into the ontology via innovative approaches. RESULTS: We conduct a series of experiments to evaluate whether the proposed method can discover meaningful and accurate probabilities for medical knowledge. Based on over 30,000 deidentified medical records, we explore 336 abdominal diseases and 81 related symptoms. Among these 336 gastrointestinal diseases, the probabilities of 31 diseases are obtained via our method. These 31 probabilities of diseases and 189 conditional probabilities between diseases and the symptoms are added into the generated ontology. CONCLUSION: In this paper, we propose a medical knowledge probability discovery method that is based on the analysis and extraction of EMR text data for enriching a medical ontology with probability information. The experimental results demonstrate that the proposed method can effectively identify accurate medical knowledge probability information from EMR data. In addition, the proposed method can efficiently and accurately calculate the probability of a patient suffering from a specified disease, thereby demonstrating the advantage of combining an ontology and a symptom-dependency-aware naïve Bayes classifier.


Assuntos
Algoritmos , Teorema de Bayes , Técnicas e Procedimentos Diagnósticos , Registros Eletrônicos de Saúde , Bases de Conhecimento , Área Sob a Curva , Doença , Humanos , Probabilidade , Curva ROC
14.
BMC Med Inform Decis Mak ; 19(Suppl 3): 74, 2019 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-30943972

RESUMO

BACKGROUND: Clinical entity recognition as a fundamental task of clinical text processing has been attracted a great deal of attention during the last decade. However, most studies focus on clinical text in English rather than other languages. Recently, a few researchers have began to study entity recognition in Chinese clinical text. METHODS: In this paper, a novel deep neural network, called attention-based CNN-LSTM-CRF, is proposed to recognize entities in Chinese clinical text. Attention-based CNN-LSTM-CRF is an extension of LSTM-CRF by introducing a CNN (convolutional neural network) layer after the input layer to capture local context information of words of interest and an attention layer before the CRF layer to select relevant words in the same sentence. RESULTS: In order to evaluate the proposed method, we compare it with other two currently popular methods, CRF (conditional random field) and LSTM-CRF, on two benchmark datasets. One of the datasets is publically available and only contains contiguous clinical entities, and the other one is constructed by us and contains contiguous and discontiguous clinical entities. Experimental results show that attention-based CNN-LSTM-CRF outperforms CRF and LSTM-CRF. CONCLUSIONS: CNN and attention mechanism are individually beneficial to LSTM-CRF-based Chinese clinical entity recognition system, no matter whether contiguous clinical entities are considered. The conribution of attention mechanism is greater than CNN.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Redes Neurais de Computação , Algoritmos , China , Conjuntos de Dados como Assunto , Humanos
15.
BMC Med Inform Decis Mak ; 19(Suppl 1): 17, 2019 01 31.
Artigo em Inglês | MEDLINE | ID: mdl-30700331

RESUMO

BACKGROUND: The goal of temporal indexing is to select an occurred time or time interval for each medical entity in clinical notes, so that all medical entities can be indexed on a united timeline, which could assist the understanding of clinical notes and the further application of medical entities. Some temporal relation shared tasks for the medical entity in English clinical notes have been organized in the past few years, such as the 2012 i2b2 NLP challenge, 2015 and 2016 clinical TempEval challenges. In these tasks, many heuristics rule-based and machine learning-based systems have been developed. In recent years, the deep neural network models have shown great potential on many problems including the relation classification. METHODS: In this paper, we propose a recurrent convolutional neural network (RNN-CNN) model for the temporal indexing task, which consists of four layers: input layer - generates representation for each context word of medical entities or temporal expressions; LSTM (long-short term memory) layer - learns the context information of each word in a sentence and outputs a new word representation sequence; CNN layer - extracts meaningful features from a sentence and outputs a new representation for medical entity or temporal expression; Output layer - takes the representations of medical entity, temporal expression and relation features as input and classifies the temporal relation. Finally, the time or time interval for each medical entity can be directly selected according to the probability of each temporal relation predicted by above model. RESULTS: To investigate the performance of our RNN-CNN model for the temporal indexing task, several baseline methods were also employed, such as the rule-based, support vector machine (SVM), convolutional neural network (CNN) and recurrent neural network (RNN) methods. Experiments conducted on a manually annotated corpus (including 563 clinical notes with 12,611 medical entities and 4006 temporal expressions) show that RNN-CNN model achieves the best F1-score of 75.97% for temporal relation classification and the best accuracy of 71.96% for temporal indexing. CONCLUSIONS: Neural network methods perform much better than the traditional rule-based and SVM-based method, which can capture more semantic information from the context of medical entities and temporal expressions. Besides, all our methods perform much better for the accurate time indexing than the time interval indexing, so how to improve the performance for time interval indexing will be the main focus in our future work.


Assuntos
Mineração de Dados , Registros Eletrônicos de Saúde , Aplicações da Informática Médica , Redes Neurais de Computação , China , Humanos , Fatores de Tempo
16.
BMC Med Inform Decis Mak ; 19(Suppl 10): 277, 2019 12 27.
Artigo em Inglês | MEDLINE | ID: mdl-31881967

RESUMO

BACKGROUND: Family history (FH) information, including family members, side of family of family members (i.e., maternal or paternal), living status of family members, observations (diseases) of family members, etc., is very important in the decision-making process of disorder diagnosis and treatment. However FH information cannot be used directly by computers as it is always embedded in unstructured text in electronic health records (EHRs). In order to extract FH information form clinical text, there is a need of natural language processing (NLP). In the BioCreative/OHNLP2018 challenge, there is a task regarding FH extraction (i.e., task1), including two subtasks: (1) entity identification, identifying family members and their observations (diseases) mentioned in clinical text; (2) family history extraction, extracting side of family of family members, living status of family members, and observations of family members. For this task, we propose a system based on deep joint learning methods to extract FH information. Our system achieves the highest F1- scores of 0.8901 on subtask1 and 0.6359 on subtask2, respectively.


Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Anamnese , Processamento de Linguagem Natural , Algoritmos , Tomada de Decisão Clínica , Biologia Computacional , Humanos
17.
BMC Med Inform Decis Mak ; 19(Suppl 2): 66, 2019 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-30961602

RESUMO

BACKGROUND: Chinese word segmentation (CWS) and part-of-speech (POS) tagging are two fundamental tasks of Chinese text processing. They are usually preliminary steps for lots of Chinese natural language processing (NLP) tasks. There have been a large number of studies on CWS and POS tagging in various domains, however, few studies have been proposed for CWS and POS tagging in the clinical domain as it is not easy to determine granularity of words. METHODS: In this paper, we investigated CWS and POS tagging for Chinese clinical text at a fine-granularity level, and manually annotated a corpus. On the corpus, we compared two state-of-the-art methods, i.e., conditional random fields (CRF) and bidirectional long short-term memory (BiLSTM) with a CRF layer. In order to validate the plausibility of the fine-grained annotation, we further investigated the effect of CWS and POS tagging on Chinese clinical named entity recognition (NER) on another independent corpus. RESULTS: When only CWS was considered, CRF achieved higher precision, recall and F-measure than BiLSTM-CRF. When both CWS and POS tagging were considered, CRF also gained an advantage over BiLSTM. CRF outperformed BiLSTM-CRF by 0.14% in F-measure on CWS and by 0.34% in F-measure on POS tagging. The CWS information brought a greatest improvement of 0.34% in F-measure, while the CWS&POS information brought a greatest improvement of 0.74% in F-measure. CONCLUSIONS: Our proposed fine-grained CWS and POS tagging corpus is reliable and meaningful as the output of the CWS and POS tagging systems developed on this corpus improved the performance of a Chinese clinical NER system on another independent corpus.


Assuntos
Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Fala , China , Humanos
18.
J Med Syst ; 43(4): 92, 2019 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-30834481

RESUMO

Measuring drug-drug similarity is important but challenging. Significant progresses have been made in drugs whose labeled training data is sufficient and available. However, handling data skewness and incompleteness with domain-specific knowledge graph, is still a relatively new territory and an under-explored prospect. In this paper, we present a system KGDDS for node-link-based bio-medical Knowledge Graph curation and visualization, aiding Drug-Drug Similarity measure. Specifically, we reuse existing knowledge bases to alleviate the difficulties in building a high-quality knowledge graph, ranging in size up to 7 million edges. Then we design a prediction model to explore the pharmacology features and knowledge graph features. Finally, we propose a user interaction model to allow the user to better understand the drug properties from a drug similarity perspective and gain insights that are not easily observable in individual drugs. Visual result demonstration and experimental results indicate that KGDDS can bridge the user/caregiver gap by facilitating antibiotics prescription knowledge, and has remarkable applicability, outperforming existing state-of-the-art drug similarity measures.


Assuntos
Recursos Audiovisuais , Substituição de Medicamentos/métodos , Bases de Conhecimento , Redes Neurais de Computação , Algoritmos , Antibacterianos/farmacologia , Humanos , Interface Usuário-Computador
19.
J Biomed Inform ; 88: 1-10, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30399432

RESUMO

The process of learning candidate causal relationships involving diseases and symptoms from electronic medical records (EMRs) is the first step towards learning models that perform diagnostic inference directly from real healthcare data. However, the existing diagnostic inference systems rely on knowledge bases such as ontology that are manually compiled through a labour-intensive process or automatically derived using simple pairwise statistics. We explore CBN, a Clinical Bayesian Network construction for medical ontology probabilistic inference, to learn high-quality Bayesian topology and complete ontology directly from EMRs. Specifically, we first extract medical entity relationships from over 10,000 deidentified patient records and adopt the odds ratio (OR value) calculation and the K2 greedy algorithm to automatically construct a Bayesian topology. Then, Bayesian estimation is used for the probability distribution. Finally, we employ a Bayesian network to complete the causal relationship and probability distribution of ontology to enhance the ontology inference capability. By evaluating the learned topology versus the expert opinions of physicians and entropy calculations and by calculating the ontology-based diagnosis classification, our study demonstrates that the direct and automated construction of a high-quality health topology and ontology from medical records is feasible. Our results are reproducible, and we will release the source code and CN-Stroke knowledge graph of this work after publication.1.


Assuntos
Teorema de Bayes , Registros Eletrônicos de Saúde , Informática Médica/métodos , Algoritmos , Coleta de Dados , Reações Falso-Positivas , Humanos , Bases de Conhecimento , Razão de Chances , Probabilidade , Curva ROC , Fatores de Risco , Software
20.
BMC Med Inform Decis Mak ; 18(Suppl 2): 60, 2018 07 23.
Artigo em Inglês | MEDLINE | ID: mdl-30066652

RESUMO

BACKGROUND: Extracting relationships between chemicals and diseases from unstructured literature have attracted plenty of attention since the relationships are very useful for a large number of biomedical applications such as drug repositioning and pharmacovigilance. A number of machine learning methods have been proposed for chemical-induced disease (CID) extraction due to some publicly available annotated corpora. Most of them suffer from time-consuming feature engineering except deep learning methods. In this paper, we propose a novel document-level deep learning method, called recurrent piecewise convolutional neural networks (RPCNN), for CID extraction. RESULTS: Experimental results on a benchmark dataset, the CDR (Chemical-induced Disease Relation) dataset of the BioCreative V challenge for CID extraction show that the highest precision, recall and F-score of our RPCNN-based CID extraction system are 65.24, 77.21 and 70.77%, which is competitive with other state-of-the-art systems. CONCLUSIONS: A novel deep learning method is proposed for document-level CID extraction, where domain knowledge, piecewise strategy, attention mechanism, and multi-instance learning are combined together. The effectiveness of the method is proved by experiments conducted on a benchmark dataset.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Distúrbios Induzidos Quimicamente , Conjuntos de Dados como Assunto , Armazenamento e Recuperação da Informação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA