Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
Bioinformatics ; 39(11)2023 11 01.
Article in English | MEDLINE | ID: mdl-37971954

ABSTRACT

MOTIVATION: In the medical field, multiple terminology bases coexist across different institutions and contexts, often resulting in the presence of redundant terms. The identification of overlapping terms among these bases holds significant potential for harmonizing multiple standards and establishing unified framework, which enhances user access to comprehensive and well-structured medical information. However, the majority of terminology bases exhibit differences not only in semantic aspects but also in the hierarchy of their classification systems. The conventional approaches that rely on neighborhood-based methods such as GCN may introduce errors due to the presence of different superordinate and subordinate terms. Therefore, it is imperative to explore novel methods to tackle this structural challenge. RESULTS: To address this heterogeneity issue, this paper proposes a multi-view alignment approach that incorporates the hierarchical structure of terminologies. We utilize BERT-based model to capture the recursive relationships among different levels of hierarchy and consider the interaction information of name, neighbors, and hierarchy between different terminologies. We test our method on mapping files of three medical open terminologies, and the experimental results demonstrate that our method outperforms baseline methods in terms of Hits@1 and Hits@10 metrics by 2%. AVAILABILITY AND IMPLEMENTATION: The source code will be available at https://github.com/Ulricab/Bert-Path upon publication.


Subject(s)
Software , Vocabulary, Controlled , Semantics , Benchmarking , Reference Standards
2.
Bioinformatics ; 39(5)2023 05 04.
Article in English | MEDLINE | ID: mdl-37220895

ABSTRACT

MOTIVATION: Biomedical relation extraction is a vital task for electronic health record mining and biomedical knowledge base construction. Previous work often adopts pipeline methods or joint methods to extract subject, relation, and object while ignoring the interaction of subject-object entity pair and relation within the triplet structure. However, we observe that entity pair and relation within a triplet are highly related, which motivates us to build a framework to extract triplets that can capture the rich interactions among the elements in a triplet. RESULTS: We propose a novel co-adaptive biomedical relation extraction framework based on a duality-aware mechanism. This framework is designed as a bidirectional extraction structure that fully takes interdependence into account in the duality-aware extraction process of subject-object entity pair and relation. Based on the framework, we design a co-adaptive training strategy and a co-adaptive tuning algorithm as collaborative optimization methods between modules to promote better mining framework performance gain. The experiments on two public datasets show that our method achieves the best F1 among all state-of-the-art baselines and provides strong performance gain on complex scenarios of various overlapping patterns, multiple triplets, and cross-sentence triplets. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/11101028/CADA-BioRE.


Subject(s)
Algorithms , Data Mining , Data Mining/methods , Language , Knowledge Bases , Electronic Health Records
3.
BMC Med Inform Decis Mak ; 23(1): 34, 2023 02 14.
Article in English | MEDLINE | ID: mdl-36788504

ABSTRACT

In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a Chinese-RoBERTa-CRF model for relation extraction in Chinese medical texts. Experimental results test on the CMeIE dataset achieves the best performance compared to existing methods. And the best F1 value obtained between different sampling strategies is 55.96%.


Subject(s)
Problem-Based Learning , Supervised Machine Learning , Language , China , Reference Books, Medical
4.
Bioinformatics ; 37(20): 3610-3617, 2021 Oct 25.
Article in English | MEDLINE | ID: mdl-34037691

ABSTRACT

MOTIVATION: Medical terminology normalization aims to map the clinical mention to terminologies coming from a knowledge base, which plays an important role in analyzing electronic health record and many downstream tasks. In this article, we focus on Chinese procedure terminology normalization. The expressions of terminology are various and one medical mention may be linked to multiple terminologies. Existing studies based on learning to rank does not fully consider the quality of negative samples during model training and the importance of keywords in this domain-specific task. RESULTS: We propose a combined recall and rank framework to solve these problems. A pair-wise Bert model with deep metric learning is used to recall candidates. Previous methods either train Bert in a point-wise way or based on a multi-class classification problem, which may lead serious efficiency problems or not be effective enough. During model training, we design a novel online negative sampling algorithm to activate the pair-wise method. To deal with multi-implication scenarios, we train the task of implication number prediction together with the recall task in a multi-task learning setting, since these two tasks are highly complementary. In rank step, we propose a keywords attentive mechanism to focus on domain-specific information such as procedure sites and procedure types. Finally, a fusion block merges the results of the recall and the rank model. Detailed experimental analysis shows our proposed framework has a remarkable improvement on both performance and efficiency. AVAILABILITY AND IMPLEMENTATION: The source code will be available at https://github.com/sxthunder/CMTN upon publication.

5.
BMC Med Inform Decis Mak ; 20(Suppl 14): 331, 2020 12 15.
Article in English | MEDLINE | ID: mdl-33323114

ABSTRACT

BACKGROUND: Laboratory indicator test results in electronic health records have been applied to many clinical big data analysis. However, it is quite common that the same laboratory examination item (i.e., lab indicator) is presented using different names in Chinese due to the translation problem and the habit problem of various hospitals, which results in distortion of analysis results. METHODS: A framework with a recall model and a binary classification model is proposed, which could reduce the alignment scale and improve the accuracy of lab indicator normalization. To reduce alignment scale, tf-idf is used for candidate selection. To assure the accuracy of output, we utilize enhanced sequential inference model for binary classification. And active learning is applied with a selection strategy which is proposed for reducing annotation cost. RESULTS: Since our indicator standardization method mainly focuses on Chinese indicator inconsistency, we perform our experiment on Shanghai Hospital Development Center and select clinical data from 8 hospitals. The method achieves a F1-score 92.08[Formula: see text] in our final binary classification. As for active learning, the new strategy proposed performs better than random baseline and could outperform the result trained on full data with only 43[Formula: see text] training data. A case study on heart failure clinic analysis conducted on the sub-dataset collected from SHDC shows that our proposed method is practical in the application with good performance. CONCLUSION: This work demonstrates that the structure we proposed can be effectively applied to lab indicator normalization. And active learning is also suitable for this task for cost reduction. Such a method is also valuable in data cleaning, data mining, text extracting and entity alignment.


Subject(s)
Electronic Health Records , Heart Failure , China , Delivery of Health Care , Heart Failure/diagnosis , Humans , Reference Standards
6.
J Biomed Inform ; 92: 103133, 2019 04.
Article in English | MEDLINE | ID: mdl-30818005

ABSTRACT

Clinical named entity recognition aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translational research. In recent years, deep neural networks have achieved significant success in named entity recognition and many other natural language processing tasks. Most of these algorithms are trained end to end, and can automatically learn features from large scale labeled datasets. However, these data-driven methods typically lack the capability of processing rare or unseen entities. Previous statistical methods and feature engineering practice have demonstrated that human knowledge can provide valuable information for handling rare and unseen cases. In this paper, we propose a new model which combines data-driven deep learning approaches and knowledge-driven dictionary approaches. Specifically, we incorporate dictionaries into deep neural networks. In addition, two different architectures that extend the bi-directional long short-term memory neural network and five different feature representation schemes are also proposed to handle the task. Computational results on the CCKS-2017 Task 2 benchmark dataset show that the proposed method achieves the highly competitive performance compared with the state-of-the-art deep learning methods.


Subject(s)
Electronic Health Records , Natural Language Processing , Neural Networks, Computer , Data Curation/methods , Deep Learning , Humans , Language
7.
BMC Med Inform Decis Mak ; 19(1): 82, 2019 04 01.
Article in English | MEDLINE | ID: mdl-30935389

ABSTRACT

BACKGROUND: While doctors should analyze a large amount of electronic medical record (EMR) data to conduct clinical research, the analyzing process requires information technology (IT) skills, which is difficult for most doctors in China. METHODS: In this paper, we build a novel tool QAnalysis, where doctors enter their analytic requirements in their natural language and then the tool returns charts and tables to the doctors. For a given question from a user, we first segment the sentence, and then we use grammar parser to analyze the structure of the sentence. After linking the segmentations to concepts and predicates in knowledge graphs, we convert the question into a set of triples connected with different kinds of operators. These triples are converted to queries in Cypher, the query language for Neo4j. Finally, the query is executed on Neo4j, and the results shown in terms of tables and charts are returned to the user. RESULTS: The tool supports top 50 questions we gathered from two hospital departments with the Delphi method. We also gathered 161 questions from clinical research papers with statistical requirements on EMR data. Experimental results show that our tool can directly cover 78.20% of these statistical questions and the precision is as high as 96.36%. Such extension is easy to achieve with the help of knowledge-graph technology we have adopted. The recorded demo can be accessed from https://github.com/NLP-BigDataLab/QAnalysis-project . CONCLUSION: Our tool shows great flexibility in processing different kinds of statistic questions, which provides a convenient way for doctors to get statistical results directly in natural language.


Subject(s)
Biomedical Research , Electronic Health Records , Natural Language Processing , China , Humans , Pattern Recognition, Automated , Software
8.
BMC Med Inform Decis Mak ; 19(Suppl 8): 259, 2019 12 17.
Article in English | MEDLINE | ID: mdl-31842854

ABSTRACT

BACKGROUND: Electronic health records (EHRs) provide possibilities to improve patient care and facilitate clinical research. However, there are many challenges faced by the applications of EHRs, such as temporality, high dimensionality, sparseness, noise, random error and systematic bias. In particular, temporal information is difficult to effectively use by traditional machine learning methods while the sequential information of EHRs is very useful. METHOD: In this paper, we propose a general-purpose patient representation learning approach to summarize sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder (RNN-DAE) is employed to encode inhospital records of each patient into a low dimensional dense vector. RESULTS: Based on EHR data collected from Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine, we experimentally evaluate our proposed RNN-DAE method on both mortality prediction task and comorbidity prediction task. Extensive experimental results show that our proposed RNN-DAE method outperforms existing methods. In addition, we apply the "Deep Feature" represented by our proposed RNN-DAE method to track similar patients with t-SNE, which also achieves some interesting observations. CONCLUSION: We propose an effective unsupervised RNN-DAE method to summarize patient sequential information in EHR data. Our proposed RNN-DAE method is useful on both mortality prediction task and comorbidity prediction task.


Subject(s)
Electronic Health Records , Forecasting , Machine Learning , Algorithms , China , Comorbidity , Heart Failure , Humans , Mortality , Neural Networks, Computer
9.
Chin Med Sci J ; 34(2): 90-102, 2019 Jun 30.
Article in English | MEDLINE | ID: mdl-31315750

ABSTRACT

Regional healthcare platforms collect clinical data from hospitals in specific areas for the purpose of healthcare management. It is a common requirement to reuse the data for clinical research. However, we have to face challenges like the inconsistence of terminology in electronic health records (EHR) and the complexities in data quality and data formats in regional healthcare platform. In this paper, we propose methodology and process on constructing large scale cohorts which forms the basis of causality and comparative effectiveness relationship in epidemiology. We firstly constructed a Chinese terminology knowledge graph to deal with the diversity of vocabularies on regional platform. Secondly, we built special disease case repositories (i.e., heart failure repository) that utilize the graph to search the related patients and to normalize the data. Based on the requirements of the clinical research which aimed to explore the effectiveness of taking statin on 180-days readmission in patients with heart failure, we built a large-scale retrospective cohort with 29647 cases of heart failure patients from the heart failure repository. After the propensity score matching, the study group (n=6346) and the control group (n=6346) with parallel clinical characteristics were acquired. Logistic regression analysis showed that taking statins had a negative correlation with 180-days readmission in heart failure patients. This paper presents the workflow and application example of big data mining based on regional EHR data.


Subject(s)
Electronic Health Records , Heart Failure/diagnosis , Algorithms , Cohort Studies , Data Mining , Female , Heart Failure/pathology , Humans , Male , Propensity Score , Retrospective Studies
10.
ScientificWorldJournal ; 2014: 848631, 2014.
Article in English | MEDLINE | ID: mdl-24715819

ABSTRACT

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.


Subject(s)
Encyclopedias as Topic , Internet , Learning , China , Humans
11.
Bioengineering (Basel) ; 11(3)2024 Feb 27.
Article in English | MEDLINE | ID: mdl-38534499

ABSTRACT

The construction of medical knowledge graphs (MKGs) is steadily progressing from manual to automatic methods, which inevitably introduce noise, which could impair the performance of downstream healthcare applications. Existing error detection approaches depend on the topological structure and external labels of entities in MKGs to improve their quality. Nevertheless, due to the cost of manual annotation and imperfect automatic algorithms, precise entity labels in MKGs cannot be readily obtained. To address these issues, we propose an approach named Enhancing error detection on Medical knowledge graphs via intrinsic labEL (EMKGEL). Considering the absence of hyper-view KG, we establish a hyper-view KG and a triplet-level KG for implicit label information and neighborhood information, respectively. Inspired by the success of graph attention networks (GATs), we introduce the hyper-view GAT to incorporate label messages and neighborhood information into representation learning. We leverage a confidence score that combines local and global trustworthiness to estimate the triplets. To validate the effectiveness of our approach, we conducted experiments on three publicly available MKGs, namely PharmKG-8k, DiseaseKG, and DiaKG. Compared with the baseline models, the Precision@K value improved by 0.7%, 6.1%, and 3.6%, respectively, on these datasets. Furthermore, our method empirically showed that it significantly outperformed the baseline on a general knowledge graph, Nell-995.

12.
Int J Med Inform ; 185: 105402, 2024 May.
Article in English | MEDLINE | ID: mdl-38467099

ABSTRACT

BACKGROUND: Gastric cancer (GC) is one of the most common malignant tumors in the world, posing a serious threat to human health. Currently, gastric cancer treatment strategies emphasize a multidisciplinary team (MDT) consultation approach. However, there are numerous treatment guidelines and insights from clinical trials. The application of AI-based Clinical Decision Support System (CDSS) in tumor diagnosis and screening is increasing rapidly. OBJECTIVE: The purpose of this study is to (1) summarize the treatment decision process for GC according to the treatment guidelines in China, and then create a knowledge graph (KG) for GC, (2) based on aforementioned KG, built a CDSS and conducted an initial feasibility evaluation for the current system. METHODS: Firstly, we summarized the decision-making process for treatment of GC. Then, we extracted relevant decision nodes and relationships and utilized Neo4j to create the KG. After obtaining the initial node features for building the graph embedding model, graph embedding algorithm, such as Node2Vec and GraphSAGE, were used to construct the GC-CDSS. At last, a retrospective cohort study was used to compare the consistency between GC-CDSS and MDT in treatment decision making. RESULTS: In current study, we introduce a GC-CDSS, which is constructed based on Chinese GC treatment guidelines knowledge graph (KG). In the KG, we define four types of nodes and four types of relationships, and it comprise a total of 207 nodes and 300 relationships. Regarding GC-CDSS, the system is capable of providing dynamic and personalized diagnostic and treatment recommendations based on the patient's condition. Furthermore, a retrospective cohort study is conducted to compare GC-CDSS recommendations with those of the MDT group, the overall consistency rate of treatment recommendations between the auxiliary decision system and MDT team is 92.96%. CONCLUSIONS: We construct a GC treatment support system, GC-CDSS, based on KG. The GC-CDSS may help oncologists make treatment decisions more efficient and promote standardization in primary healthcare settings.


Subject(s)
Decision Support Systems, Clinical , Stomach Neoplasms , Humans , Stomach Neoplasms/diagnosis , Stomach Neoplasms/therapy , Retrospective Studies , Pattern Recognition, Automated , Algorithms
13.
Comput Biol Med ; 153: 106516, 2023 02.
Article in English | MEDLINE | ID: mdl-36628914

ABSTRACT

Medical image segmentation is an essential task in clinical diagnosis and case analysis. Most of the existing methods are based on U-shaped convolutional neural networks (CNNs), and one of disadvantages is that the long-term dependencies and global contextual connections cannot be effectively established, which results in inaccuracy segmentation. For fully using low-level features to enhance global features and reduce the semantic gap between encoding and decoding stages, we propose a novel Swin Transformer boosted U-Net (ST-Unet) for medical image processing in this paper, in which Swin Transformer and CNNs are used as encoder and decoder respectively. Then a novel Cross-Layer Feature Enhancement (CLFE) module is proposed to realize cross-layer feature learning, and a Spatial and Channel Squeeze & Excitation module is adopted to highlight the saliency of specific regions. Finally, we learn the features fused by the CLFE module through CNNs to recover low-level features and localize local features for realizing more accurate semantic segmentation. Experiments on widely used public datasets Synapse and ISIC 2018 prove that our proposed ST-Unet can achieve 78.86 of dice and 0.9243 of recall performance, outperforming most current medical image segmentation methods.


Subject(s)
Image Processing, Computer-Assisted , Learning , Neural Networks, Computer , Semantics
14.
IEEE Trans Nanobioscience ; 18(3): 306-315, 2019 07.
Article in English | MEDLINE | ID: mdl-30946674

ABSTRACT

Clinical named entity recognition (CNER) is a fundamental and crucial task for clinical and translation research. In recent years, deep learning methods have achieved significant success in CNER tasks. However, these methods depend greatly on recurrent neural networks (RNNs), which maintain a vector of hidden activations that are propagated through time, thus causing too much time to train models. In this paper, we propose a residual dilated convolutional neural network with the conditional random field (RD-CNN-CRF) for the Chinese CNER, which makes the model asynchronous in computation and thus speeding up the training period dramatically. To be more specific, Chinese characters and dictionary features are first projected into dense vector representations, then they are fed into the residual dilated convolutional neural network to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags and obtain the optimal tag sequence for the entire sequence. Computational results on the CCKS-2017 Task 2 benchmark dataset show that our proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based methods both in terms of computational performance and training time.


Subject(s)
Electronic Health Records , Medical Informatics/methods , Natural Language Processing , Neural Networks, Computer , China , Databases, Factual , Humans
15.
Int J Med Inform ; 115: 10-17, 2018 07.
Article in English | MEDLINE | ID: mdl-29779711

ABSTRACT

OBJECTIVE: This paper constructs a mortality prediction system based on a real-world dataset. This mortality prediction system aims to predict mortality in heart failure (HF) patients. Effective mortality prediction can improve resources allocation and clinical outcomes, avoiding inappropriate overtreatment of low-mortality patients and discharging of high-mortality patients. This system covers three mortality prediction targets: prediction of in-hospital mortality, prediction of 30-day mortality and prediction of 1-year mortality. MATERIALS AND METHODS: HF data are collected from the Shanghai Shuguang hospital. 10,203 in-patients records are extracted from encounters occurring between March 2009 and April 2016. The records involve 4682 patients, including 539 death cases. A feature selection method called Orthogonal Relief (OR) algorithm is first used to reduce the dimensionality. Then, a classification algorithm named Dynamic Radius Means (DRM) is proposed to predict the mortality in HF patients. RESULTS AND DISCUSSIONS: The comparative experimental results demonstrate that mortality prediction system achieves high performance in all targets by DRM. It is noteworthy that the performance of in-hospital mortality prediction achieves 87.3% in AUC (35.07% improvement). Moreover, the AUC of 30-day and 1-year mortality prediction reach to 88.45% and 84.84%, respectively. Especially, the system could keep itself effective and not deteriorate when the dimension of samples is sharply reduced. CONCLUSIONS: The proposed system with its own method DRM can predict mortality in HF patients and achieve high performance in all three mortality targets. Furthermore, effective feature selection strategy can boost the system. This system shows its importance in real-world applications, assisting clinicians in HF treatment by providing crucial decision information.


Subject(s)
Heart Failure/mortality , Models, Statistical , Aged , Algorithms , China , Female , Heart Failure/therapy , Hospital Mortality , Humans
16.
J Biomed Semantics ; 8(Suppl 1): 33, 2017 09 20.
Article in English | MEDLINE | ID: mdl-29297414

ABSTRACT

BACKGROUND: While a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, the symptom KB in Chinese is the most seriously in need, since symptoms are the starting point of clinical diagnosis. RESULTS: We publish a public KB of symptoms in Chinese, including symptoms, departments, diseases, medicines, and examinations as well as relations between symptoms and the above related entities. To the best of our knowledge, there is no such KB focusing on symptoms in Chinese, and the KB is an important supplement to existing medical resources. Our KB is constructed by fusing data automatically extracted from eight mainstream healthcare websites, three Chinese encyclopedia sites, and symptoms extracted from a larger number of EMRs as supplements. METHODS: Firstly, we design data schema manually by reference to the Unified Medical Language System (UMLS). Secondly, we extract entities from eight mainstream healthcare websites, which are fed as seeds to train a multi-class classifier and classify entities from encyclopedia sites and train a Conditional Random Field (CRF) model to extract symptoms from EMRs. Thirdly, we fuse data to solve the large-scale duplication between different data sources according to entity type alignment, entity mapping, and attribute mapping. Finally, we link our KB to UMLS to investigate similarities and differences between symptoms in Chinese and English. CONCLUSIONS: As a result, the KB has more than 26,000 distinct symptoms in Chinese including 3968 symptoms in traditional Chinese medicine and 1029 synonym pairs for symptoms. The KB also includes concepts such as diseases and medicines as well as relations between symptoms and the above related entities. We also link our KB to the Unified Medical Language System and analyze the differences between symptoms in the two KBs. We released the KB as Linked Open Data and a demo at https://datahub.io/dataset/symptoms-in-chinese .


Subject(s)
Disease , Knowledge Bases , Language , Medical Informatics/methods , Automation , Data Mining , Electronic Health Records
SELECTION OF CITATIONS
SEARCH DETAIL