Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Comput Biol Med ; 170: 107936, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38244473

RESUMEN

Drug repurposing is a strategy aiming at uncovering novel medical indications of approved drugs. This process of discovery can be effectively represented as a link prediction task within a medical knowledge graph by predicting the missing relation between the disease entity and the drug entity. Typically, the links to be predicted pertain to rare types, thereby necessitating the task of few-shot link prediction. However, the sparsity of neighborhood information and weak triplet interactions result in less effective representations, which brings great challenges to the few-shot link prediction. Therefore, in this paper, we proposed a meta-learning framework based on a multi-level attention network (MLAN) to capture valuable information in the few-shot scenario for drug repurposing. First, the proposed method utilized a gating mechanism and a graph attention network to effectively filter noise information and highlight the valuable neighborhood information, respectively. Second, the proposed commonality relation learner, employing a set transformer, effectively captured triplet-level interactions while remaining insensitive to the size of the support set. Finally, a model-agnostic meta-learning training strategy was employed to optimize the model quickly on each meta task. We conducted validation of the proposed method on two datasets specifically designed for few-shot link prediction in medical field: COVID19-One and BIOKG-One. Experimental results showed that the proposed model had significant advantages over state-of-the-art few-shot link prediction methods. Results also highlighted the valuable insights of the proposed method, which successfully integrated the components within a unified meta-learning framework for drug repurposing.


Asunto(s)
COVID-19 , Reposicionamiento de Medicamentos , Humanos , Aprendizaje
2.
Comput Methods Programs Biomed ; 242: 107838, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37832431

RESUMEN

BACKGROUND AND OBJECTIVE: Clinical risk prediction of patients is an important research issue in the field of healthcare, which is of great significance for the diagnosis, treatment and prevention of diseases. In recent years, a large number of deep learning-based methods have been proposed for clinical prediction by mining relevant features of patients' health condition from historical Electronic Health Records (EHRs) data. However, most of these existing methods only focus on discovering the time series characteristics of physiological indexes such as laboratory tests and physical examinations, and fail to comprehensively consider the deviation degree of these physiological indexes from the normal range and their stability, thus greatly limiting the prediction performance. METHODS: We propose a personalized clinical time-series representation learning framework via abnormal offsets analysis named PARSE for clinical risk prediction. In PARSE, while extracting relevant temporal features from the original EHR data, we further capture relevant features of abnormal condition as complementary information from the absolute offset of each physiological index's observed values from its normal value and the relative offset between each physiological index's observed values in two adjacent time steps. Finally, an adaptive fusion module is introduced to effectively integrate the above features to obtain the personalized patient's representations for clinical risk prediction. RESULTS: We conduct an in-hospital mortality prediction task on two public real-world datasets. PARSE achieves the highest F1 scores of 48.1% and 40.3%, outperforming the state-of-the-art methods with a boost of 2.4% and 6.2% on two datasets respectively. Furthermore, the results of ablation experiments demonstrate that the two abnormal offsets and the proposed adaptive fusion method are contributing. CONCLUSIONS: PARSE can better extract the risk-related information from the EHRs data and improve the personalization of the patients' representations. Each part of PARSE improves the final prediction performance independently.


Asunto(s)
Registros Electrónicos de Salud , Humanos , Factores de Tiempo
3.
J Biomed Inform ; 145: 104456, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37482171

RESUMEN

Triplet extraction is one of the fundamental tasks in biomedical text mining. Compared with traditional pipeline approaches, joint methods can alleviate the error propagation problem from entity recognition to relation classification. However, existing methods face challenges in detecting overlapping entities and overlapping relations, which are ubiquitous in biomedical texts. In this work, we propose a novel pipeline method of end-to-end biomedical triplet extraction. In particular, a span-based detection strategy is used to detect the overlapping triplets by enumerating possible candidate spans and entity pairs. The strategy is further used to capture different contextualized representations via an entity model and a relation model, respectively. Furthermore, to enhance interrelation between spans, entity information from the output of the entity model is used to construct the input for the relation model without utilizing any external knowledge. Our approach is evaluated on the drug-drug interaction (DDI) and chemical-protein interaction (CHEMPROT) datasets, exhibiting improvement of the absolute F1-score in relation extraction by 3.5%-3.7% compared prior work. The experimental results highlight the importance of overlapping triplet detection using the span-based approach, acquisition of various contextualized representations via different in-domain pre-trained language models, and early fusion of entity information in the relation model.


Asunto(s)
Minería de Datos , Lenguaje , Minería de Datos/métodos , Procesamiento de Lenguaje Natural , Proteínas , Interacciones Farmacológicas
4.
Comput Intell Neurosci ; 2022: 4207940, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36567811

RESUMEN

Accurately predicting the clinical endpoint in ICU based on the patient's electronic medical records (EMRs) is essential for the timely treatment of critically ill patients and allocation of medical resources. However, the patient's EMRs usually consist of a large amount of heterogeneous multivariate time series data such as laboratory tests and vital signs, which are produced irregularly. Most existing methods fail to effectively model the time irregularity inherent in longitudinal patient medical records and capture the interrelationships among different types of data. To tackle these limitations, we propose a novel time-aware transformer-based hierarchical attention network (TERTIAN) for clinical endpoint prediction. In this model, a time-aware transformer is introduced to learn the personalized irregular temporal patterns of medical events, and a hierarchical attention mechanism is deployed to get the accurate patient fusion representation by comprehensively mining the interactions and correlations among multiple types of medical data. We evaluate our model on the MIMIC-III dataset and MIMIC-IV dataset for the task of mortality prediction, and the results show that TERTIAN achieves higher performance than state-of-the-art approaches.


Asunto(s)
Suministros de Energía Eléctrica , Aprendizaje , Humanos , Factores de Tiempo , Unidades de Cuidados Intensivos
5.
Comput Biol Med ; 151(Pt A): 106245, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36335809

RESUMEN

It is an important research task in the field of medical big data to predict patient's future health status according to the historical temporal Electronic Health Records (EHRs). Most of the existing deep learning-based medical prediction methods only focus on the patient's individual information. However, due to the sparseness and low quality of EHR data, individual clinical records of single patient often cannot provide complete health information, which severely limits the accuracy of the prediction models. In this paper, we propose a Multi-graph attEntive Representation learning framework integrating Group information from similar patiEnts(MERGE) for medical prediction. In this framework, while capturing the individual patient's temporal characteristics through the individual representation learning module, the group representation leaning module is used to learn group representations of similar patients from different aspects as a supplement, thereby effectively improving the accuracy of patients' representation. We evaluate our method on the MIMIC-III dataset for the task of in-hospital mortality prediction and Xiangya dataset for cardiovascular diseases (CVDs) prediction. The experimental results show that MERGE outperforms the state-of-the-art methods.


Asunto(s)
Macrodatos , Registros Electrónicos de Salud , Humanos , Predicción
6.
Artif Intell Med ; 127: 102282, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35430042

RESUMEN

Clinical named entity recognition (CNER) is a fundamental step for many clinical Natural Language Processing (NLP) systems, which aims to recognize and classify clinical entities such as diseases, symptoms, exams, body parts and treatments in clinical free texts. In recent years, with the development of deep learning technology, deep neural networks (DNNs) have been widely used in Chinese clinical named entity recognition and many other clinical NLP tasks. However, these state-of-the-art models failed to make full use of the global information and multi-level semantic features in clinical texts. We design an improved character-level representation approach which integrates the character embedding and the character-label embedding to enhance the specificity and diversity of feature representations. Then, a multi-head self-attention based Bi-directional Long Short-Term Memory Conditional Random Field (MUSA-BiLSTM-CRF) model is proposed. By introducing the multi-head self-attention and combining a medical dictionary, the model can more effectively capture the weight relationships between characters and multi-level semantic feature information, which is expected to greatly improve the performance of Chinese clinical named entity recognition. We evaluate our model on two CCKS challenge (CCKS2017 Task 2 and CCKS2018 Task 1) benchmark datasets and the experimental results show that our proposed model achieves the best performance competing with the state-of-the-art DNN based methods.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , China , Lenguaje , Redes Neurales de la Computación
7.
BMC Med Inform Decis Mak ; 22(1): 72, 2022 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-35321705

RESUMEN

OBJECTIVE: Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient's physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain abundant diagnosis and treatment information. However, this information has not been well utilized because of the challenge to extract information from unstructured clinical texts. This study aims to enable machines to intelligently process clinical information, and automatically extract clinical named entity for pituitary adenomas from Chinese EMRs. METHODS: The clinical corpus used in this study was from one pituitary adenomas neurosurgery treatment center of a 3A hospital in China. Four types of fine-grained texts of clinical records were selected, which included notes from present illness, past medical history, case characteristics and family history of 500 pituitary adenoma inpatients. The dictionary-based matching, conditional random fields (CRF), bidirectional long short-term memory with CRF (BiLSTM-CRF), and bidirectional encoder representations from transformers with BiLSTM-CRF (BERT-BiLSTM-CRF) were used to extract clinical entities from a Chinese EMRs corpus. A comprehensive dictionary was constructed based on open source vocabularies and a domain dictionary for pituitary adenomas to conduct the dictionary-based matching method. We selected features such as part of speech, radical, document type, and the position of characters to train the CRF-based model. Random character embeddings and the character embeddings pretrained by BERT were used respectively as the input features for the BiLSTM-CRF model and the BERT-BiLSTM-CRF model. Both strict metric and relaxed metric were used to evaluate the performance of these methods. RESULTS: Experimental results demonstrated that the deep learning and other machine learning methods were able to automatically extract clinical named entities, including symptoms, body regions, diseases, family histories, surgeries, medications, and disease courses of pituitary adenomas from Chinese EMRs. With regard to overall performance, BERT-BiLSTM-CRF has the highest strict F1 value of 91.27% and the highest relaxed F1 value of 95.57% respectively. Additional evaluations showed that BERT-BiLSTM-CRF performed best in almost all entity recognition except surgery and disease course. BiLSTM-CRF performed best in disease course entity recognition, and performed as well as the CRF model for part of speech, radical and document type features, with both strict and relaxed F1 value reaching 96.48%. The CRF model with part of speech, radical and document type features performed best in surgery entity recognition with relaxed F1 value of 95.29%. CONCLUSIONS: In this study, we conducted four entity recognition methods for pituitary adenomas based on Chinese EMRs. It demonstrates that the deep learning methods can effectively extract various types of clinical entities with satisfying performance. This study contributed to the clinical named entity extraction from Chinese neurosurgical EMRs. The findings could also assist in information extraction in other Chinese medical texts.


Asunto(s)
Registros Electrónicos de Salud , Neoplasias Hipofisarias , Humanos , Almacenamiento y Recuperación de la Información , Lenguaje , Procesamiento de Lenguaje Natural , Neoplasias Hipofisarias/diagnóstico
8.
JMIR Med Inform ; 9(7): e28218, 2021 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-34057414

RESUMEN

BACKGROUND: Pituitary adenoma is one of the most common central nervous system tumors. The diagnosis and treatment of pituitary adenoma remain very difficult. Misdiagnosis and recurrence often occur, and experienced neurosurgeons are in serious shortage. A knowledge graph can help interns quickly understand the medical knowledge related to pituitary tumor. OBJECTIVE: The aim of this study was to develop a data fusion method suitable for medical data using data of pituitary adenomas integrated from different sources. The overall goal was to construct a knowledge graph for pituitary adenoma (KGPA) to be used for knowledge discovery. METHODS: A complete framework suitable for the construction of a medical knowledge graph was developed, which was used to build the KGPA. The schema of the KGPA was manually constructed. Information of pituitary adenoma was automatically extracted from Chinese electronic medical records (CEMRs) and medical websites through a conditional random field model and newly designed web wrappers. An entity fusion method is proposed based on the head-and-tail entity fusion model to fuse the data from heterogeneous sources. RESULTS: Data were extracted from 300 CEMRs of pituitary adenoma and 4 health portals. Entity fusion was carried out using the proposed data fusion model. The F1 scores of the head and tail entity fusions were 97.32% and 98.57%, respectively. Triples from the constructed KGPA were selected for evaluation, demonstrating 95.4% accuracy. CONCLUSIONS: This paper introduces an approach to fuse triples extracted from heterogeneous data sources, which can be used to build a knowledge graph. The evaluation results showed that the data in the KGPA are of high quality. The constructed KGPA can help physicians in clinical practice.

9.
IEEE/ACM Trans Comput Biol Bioinform ; 18(3): 1093-1105, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-31425047

RESUMEN

High-risk prediction of cardiovascular disease is of great significance and impendency in medical fields with the increasing phenomenon of sub-health these years. Most existing pathological methods for the prognosis prediction are either costly or prone to misjudgement. Therefore, plenty of automated models based on machine learning have been proposed to predict the onset of cardiovascular disease with the premorbid information of patients extracted from their historical Electronic Health Records (EHRs). However, it is a tough job to select proper features from longitudinal and heterogeneous EHRs, and also a great challenge to obtain accurate and robust representations for patients. In this paper, we propose an entirely end-to-end model called DeepRisk based on attention mechanism and deep neural networks, which can not only learn high-quality features automatically from EHRs, but also efficiently integrate heterogeneous and time-ordered medical data, and finally predict patients' risk of cardiovascular diseases. Experiments are carried out on a real medical dataset and results show that DeepRisk can significantly improve the high-risk prediction accuracy for cardiovascular disease compared with state-of-the-art approaches.


Asunto(s)
Enfermedades Cardiovasculares/diagnóstico , Aprendizaje Profundo , Informática Médica/métodos , Adulto , Algoritmos , Minería de Datos , Registros Electrónicos de Salud , Femenino , Humanos , Masculino , Persona de Mediana Edad , Redes Neurales de la Computación , Medición de Riesgo
10.
J Biomed Inform ; 114: 103666, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33352331

RESUMEN

Compared with the general complex network, the multilayer network is more suitable for the description of reality. It can be used as a tool of network pharmacology to analyze the mechanism of drug action from an overall perspective. Combined with random walk algorithm, it measures the importance of nodes from the entire network rather than a single layer. Here a four-layer network was constructed based on the data about the action process of prescriptions, consisting of ingredients, target proteins, metabolic pathways and diseases. The random walk algorithm was used to calculate the betweenness centrality of the protein layer nodes to get the rank of their importance. According to above method, we screened out the top 10% proteins that play a key role in treatment. Prescriptions Xiaochaihu Decoction was taken as example to prove our method. The selected proteins were measured with the ones that have been validated to be associated with the treated diseases. The results showed that its accuracy was no less than the topology-based method of single-layer network. The applicability of our method was proved by another prescription Yupingfeng Decoction. Our study demonstrated that multilayer network combined with random walk algorithm was an effective method for pre-screening vital target proteins related to prescriptions.


Asunto(s)
Algoritmos , Proteínas
11.
BMC Gastroenterol ; 20(1): 180, 2020 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-32517710

RESUMEN

BACKGROUND: Nonalcoholic fatty liver disease (NAFLD) is a risk factor for colorectal neoplasms. Our goal is to explore the relationship between NAFLD and colorectal cancer (CRC) and to analyze potential indicators for screening CRC in NAFLD based on clinical big data. METHODS: Demographic information and routine clinical indicators were extracted from Xiangya Medical Big Data Platform. 35,610 NAFLD cases without CRC (as group NAFLD-CRC), 306 NAFLD cases with CRC (as group NAFLD-NonCRC) and 10,477 CRC cases without NAFLD were selected and evaluated. The CRC incidence was compared between NAFLD population and general population by Chi-square test. Independent sample t-test was used to find differences of age, gender and routine clinical indicators in pairwise comparisons of NAFLD-CRC, NAFLD-NonCRC and nonNAFLD-CRC. RESULTS: NAFLD population had a higher CRC incidence than general population (7.779‰ vs 3.763‰, P < 0.001). Average age of NAFLD-CRC (58.79 ± 12.353) or nonNAFLD-CRC (59.26 ± 13.156) was significantly higher than NAFLD-nonCRC (54.15 ± 14.167, p < 0.001). But age had no significant difference between NAFLD-CRC and nonNAFLD-CRC (P > 0.05). There was no different gender distribution for three groups (P > 0.05). NAFLD-CRC had lower anaemia-related routine clinical indicators such as decrease of red blood cell count, mean hemoglobin content and hemoglobin than NAFLD-nonCRC (P < 0.05 for all). Anemia of NAFLD-CRC was typical but it might be slighter than nonNAFLD-CRC. More interestingly, NAFLD-CRC had distinct characteristics of leukocyte system such as lower white blood cell count (WBC) and neutrophil count (NEU_C) and higher basophil percentage (BAS_Per) than nonNAFLD-CRC and NAFLD-nonCRC (P < 0.05 for all). Compared with NAFLD-nonCRC, the change of WBC, BAS_Per and NEU_C in NAFLD-CRC was different from that in nonNAFLD-CRC. In addition, NAFLD-CRC had a higher level of low density lipoprotein (LDL) and high density lipoprotein (HDL), lower level of triglyceride (TG) and Albumin-to-globulin ratio (A/G) than NFLD-nonCRC (P < 0.05 for all). CONCLUSIONS: NAFLD is associated with a high incidence of CRC. Age is an important factor for CRC and the CRC incidence increases with age. Anemia-related blood routine clinical indicators, leukocyte system and blood lipid indicators may be more important variables for identifying CRC in NAFLD. So blood routine test and liver function/blood lipid test are valuable for screening CRC in NAFLD.


Asunto(s)
Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/epidemiología , Detección Precoz del Cáncer/métodos , Enfermedad del Hígado Graso no Alcohólico/complicaciones , Factores de Edad , Anciano , Biomarcadores de Tumor/sangre , Distribución de Chi-Cuadrado , Neoplasias Colorrectales/etiología , Femenino , Humanos , Incidencia , Lípidos/sangre , Pruebas de Función Hepática , Masculino , Persona de Mediana Edad , Enfermedad del Hígado Graso no Alcohólico/sangre , Factores de Riesgo
12.
JMIR Med Inform ; 5(1): e2, 2017 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-28213343

RESUMEN

BACKGROUND: As one of the several effective solutions for personal privacy protection, a global unique identifier (GUID) is linked with hash codes that are generated from combinations of personally identifiable information (PII) by a one-way hash algorithm. On the GUID server, no PII is permitted to be stored, and only GUID and hash codes are allowed. The quality of PII entry is critical to the GUID system. OBJECTIVE: The goal of our study was to explore a method of checking questionable entry of PII in this context without using or sending any portion of PII while registering a subject. METHODS: According to the principle of GUID system, all possible combination patterns of PII fields were analyzed and used to generate hash codes, which were stored on the GUID server. Based on the matching rules of the GUID system, an error-checking algorithm was developed using set theory to check PII entry errors. We selected 200,000 simulated individuals with randomly-planted errors to evaluate the proposed algorithm. These errors were placed in the required PII fields or optional PII fields. The performance of the proposed algorithm was also tested in the registering system of study subjects. RESULTS: There are 127,700 error-planted subjects, of which 114,464 (89.64%) can still be identified as the previous one and remaining 13,236 (10.36%, 13,236/127,700) are discriminated as new subjects. As expected, 100% of nonidentified subjects had errors within the required PII fields. The possibility that a subject is identified is related to the count and the type of incorrect PII field. For all identified subjects, their errors can be found by the proposed algorithm. The scope of questionable PII fields is also associated with the count and the type of the incorrect PII field. The best situation is to precisely find the exact incorrect PII fields, and the worst situation is to shrink the questionable scope only to a set of 13 PII fields. In the application, the proposed algorithm can give a hint of questionable PII entry and perform as an effective tool. CONCLUSIONS: The GUID system has high error tolerance and may correctly identify and associate a subject even with few PII field errors. Correct data entry, especially required PII fields, is critical to avoiding false splits. In the context of one-way hash transformation, the questionable input of PII may be identified by applying set theory operators based on the hash codes. The count and the type of incorrect PII fields play an important role in identifying a subject and locating questionable PII fields.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA