Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Apoptosis ; 29(1-2): 229-242, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37751105

RESUMO

PANoptosis has recently been discovered as a new type of cell death. PANoptosis mainly refers to the significant interaction among the three programmed cell death pathways of apoptosis, necroptosis, and pyroptosis. Despite this, only a few studies have examined the systematic literature in this area. By analyzing the bibliometric data for PANoptosis, we can visualize the current hotspots and predicted trends in research. This study analyzed bibliometric indicators using the Histcite Pro 2.0 tool, which searches the Web of Science for PANoptosis literature published between 2016 and 2022. A bibliometric analysis was performed using Histcite Pro 2.0, while research trends and hotspots were visualized using VOSviewer, CiteSpace and BioBERT. The output of related literature was low in the four years from the first presentation of PANoptosis in 2016 to 2020. The volume of relevant literature grew exponentially between 2020 and 2022. The United States and China play a leading role in this field. Although China started late, its research in this field is developing rapidly. As research progressed, more focus was placed on the relationship between PANoptosis and pyroptosis, as well as apoptosis and necrosis. Now is a rapid development stage of PANoptosis research. Most of the research focuses on the cellular level, and the focus is more on the treatment of tumor-related diseases. The current focus of this area is PANoptosis mechanisms in cancer and inflammation. It can be seen from the burst analysis of keywords that caspase1 and host defense have consistently been research hotspots in the field of PANoptosis, while the frequency of NLRC4, causes of autoinflammation, recognition, NLRP3, and Gasdermin D has gradually increased, all of which have become research hotspots in recent years. Finally, we used the BioBERT biomedical language model to mine the most documented genes and diseases in the PANoptosis field articles, pointing out the direction for subsequent research steps. According to a bibliometric analysis, researchers have shown an increased interest in PANoptosis over the past few years. Researchers initially focused on the molecular mechanism of PANoptosis and pyroptosis, apoptosis, and necroptosis. The role of PANoptosis in diseases and conditions such as inflammation and tumors is one of the current research hotspots in this area. The focus is more on treating inflammation-related diseases, which will become the key development direction of future research.


Assuntos
Apoptose , Reconhecimento Automatizado de Padrão , Humanos , Morte Celular , Bibliometria , Inflamação
2.
Ecotoxicol Environ Saf ; 281: 116671, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38959788

RESUMO

BACKGROUND: With the advancement of medical technology, tools such as electrosurgical equipment, laser knives, and ultrasonic scalpels have made modern medical procedures more convenient and effective. However, the generation of surgical smoke during these procedures poses significant health risks to medical personnel. Despite this, only a few studies have examined the literature systematically in this area. By analyzing bibliometric data on surgical smoke, we can gain insights into current research hotspots and forecast future trends. METHODS: This study included literature related to surgical smoke from the Web of Science and China National Knowledge Infrastructure (CNKI) databases, covering the period from 2000 to 2024. We used VOSviewer, CiteSpace, and BioBERT to visualize research trends and hotspots. RESULTS: In the early stages of research, the focus was mainly on the composition, generation mechanisms, and susceptible populations related to surgical smoke. In recent years, with the development of laparoscopic surgery and the global COVID-19 pandemic, research interests have shifted towards occupational protection of healthcare workers and public health. Currently, the research in this field primarily explores the promoting effects of surgical smoke on conditions such as inflammation and tumors, as well as occupational protection and health education for healthcare workers. Disease research focuses heavily on Smoke Inhalation Injury, Infections, Neoplasms, Postoperative Complications, and Inflammation. CONCLUSION: We explored future research directions in the field of surgical smoke using VOSviewer, CiteSpace, and BioBERT. Our findings indicate that current research focuses on investigating the promoting effects of surgical smoke on conditions such as inflammation and tumors, as well as on occupational protection and health education for healthcare workers. We summarized existing preventive measures, aiming to facilitate further research advancements and the translation of research outcomes into clinical results. These efforts provide new insights for advancing research in occupational protection of healthcare workers.


Assuntos
Exposição Ocupacional , Fumaça , Humanos , Bibliometria , China , Pessoal de Saúde/estatística & dados numéricos , Fumaça/efeitos adversos
3.
BMC Bioinformatics ; 24(1): 42, 2023 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-36755230

RESUMO

BACKGROUND: The biomedical literature is growing rapidly, and it is increasingly important to extract meaningful information from the vast amount of literature. Biomedical named entity recognition (BioNER) is one of the key and fundamental tasks in biomedical text mining. It also acts as a primitive step for many downstream applications such as relation extraction and knowledge base completion. Therefore, the accurate identification of entities in biomedical literature has certain research value. However, this task is challenging due to the insufficiency of sequence labeling and the lack of large-scale labeled training data and domain knowledge. RESULTS: In this paper, we use a novel word-pair classification method, design a simple attention mechanism and propose a novel architecture to solve the research difficulties of BioNER more efficiently without leveraging any external knowledge. Specifically, we break down the limitations of sequence labeling-based approaches by predicting the relationship between word pairs. Based on this, we enhance the pre-trained model BioBERT, through the proposed prefix and attention map dscrimination fusion guided attention and propose the E-BioBERT. Our proposed attention differentiates the distribution of different heads in different layers in the BioBERT, which enriches the diversity of self-attention. Our model is superior to state-of-the-art compared models on five available datasets: BC4CHEMD, BC2GM, BC5CDR-Disease, BC5CDR-Chem, and NCBI-Disease, achieving F1-score of 92.55%, 85.45%, 87.53%, 94.16% and 90.55%, respectively. CONCLUSION: Compared with many previous various models, our method does not require additional training datasets, external knowledge, and complex training process. The experimental results on five BioNER benchmark datasets demonstrate that our model is better at mining semantic information, alleviating the problem of label inconsistency, and has higher entity recognition ability. More importantly, we analyze and demonstrate the effectiveness of our proposed attention.


Assuntos
Bases de Conhecimento , Semântica , Mineração de Dados/métodos , Benchmarking
4.
BMC Bioinformatics ; 24(1): 3, 2023 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-36597033

RESUMO

PURPOSE: The objective of the manuscript is to propose a hybrid algorithm combining the improved BM25 algorithm, k-means clustering, and BioBert model to better determine biomedical articles utilizing the PubMed database so, the number of retrieved biomedical articles whose content contains much similar information regarding a query of a specific disease could grow larger. DESIGN/METHODOLOGY/APPROACH: In the paper, a two-stage information retrieval method is proposed to conduct an improved Text-Rank algorithm. The first stage consists of employing the improved BM25 algorithm to assign scores to biomedical articles in the database and identify the 1000 publications with the highest scores. The second stage is composed of employing a method called a cluster-based abstract extraction to reduce the number of article abstracts to match the input constraints of the BioBert model, and then the BioBert-based document similarity matching method is utilized to obtain the most similar search outcomes between the document and the retrieved morphemes. To realize reproducibility, the written code is made available on https://github.com/zzc1991/TREC_Precision_Medicine_Track . FINDINGS: The experimental study is conducted based on the data sets of TREC2017 and TREC2018 to train the proposed model and the data of TREC2019 is used as a validation set confirming the effectiveness and practicability of the proposed algorithm that would be implemented for clinical decision support in precision medicine with a generalizability feature. ORIGINALITY/VALUE: This research integrates multiple machine learning and text processing methods to devise a hybrid method applicable to domains of specific medical literature retrieval. The proposed algorithm provides a 3% increase of P@10 than that of the state-of-the-art algorithm in TREC 2019.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Medicina de Precisão , Reprodutibilidade dos Testes , Algoritmos , Aprendizado de Máquina
5.
BMC Bioinformatics ; 23(1): 4, 2022 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-34983371

RESUMO

MOTIVATION: Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. METHOD: We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. RESULTS AND CONCLUSION: The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.


Assuntos
Mineração de Dados , Processamento de Proteína Pós-Traducional , Humanos , Proteínas , PubMed
6.
BMC Bioinformatics ; 23(1): 501, 2022 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-36418937

RESUMO

BACKGROUND: Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and deep neural networks to implement biomedical named entity recognition (BioNER) is a common method at present. However, the above method often underutilizes syntactic features such as dependencies and topology of sentences. Therefore, it is an urgent problem to be solved to integrate semantic and syntactic features into the BioNER model. RESULTS: In this paper, we propose a novel biomedical named entity recognition model, named BioByGANS (BioBERT/SpaCy-Graph Attention Network-Softmax), which uses a graph to model the dependencies and topology of a sentence and formulate the BioNER task as a node classification problem. This formulation can introduce more topological features of language and no longer be only concerned about the distance between words in the sequence. First, we use periods to segment sentences and spaces and symbols to segment words. Second, contextual features are encoded by BioBERT, and syntactic features such as part of speeches, dependencies and topology are preprocessed by SpaCy respectively. A graph attention network is then used to generate a fusing representation considering both the contextual features and syntactic features. Last, a softmax function is used to calculate the probabilities and get the results. We conduct experiments on 8 benchmark datasets, and our proposed model outperforms existing BioNER state-of-the-art methods on the BC2GM, JNLPBA, BC4CHEMD, BC5CDR-chem, BC5CDR-disease, NCBI-disease, Species-800, and LINNAEUS datasets, and achieves F1-scores of 85.15%, 78.16%, 92.97%, 94.74%, 87.74%, 91.57%, 75.01%, 90.99%, respectively. CONCLUSION: The experimental results on 8 biomedical benchmark datasets demonstrate the effectiveness of our model, and indicate that formulating the BioNER task into a node classification problem and combining syntactic features into the graph attention networks can significantly improve model performance.


Assuntos
Idioma , Semântica , Fala , Conhecimento , Benchmarking
7.
J Biomed Inform ; 126: 103982, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34974190

RESUMO

Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers, transfer learning, and self-supervised learning (SSL). Following the success of these models in the general domain, the biomedical research community has developed various in-domain PLMs starting from BioBERT to the latest BioELECTRA and BioALBERT models. We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs). In this survey, we start with a brief overview of foundational concepts like self-supervised learning, embedding layer and transformer encoder layers. We discuss core concepts of transformer-based PLMs like pretraining methods, pretraining tasks, fine-tuning methods, and various embedding types specific to biomedical domain. We introduce a taxonomy for transformer-based BPLMs and then discuss all the models. We discuss various challenges and present possible solutions. We conclude by highlighting some of the open issues which will drive the research community to further improve transformer-based BPLMs. The list of all the publicly available transformer-based BPLMs along with their links is provided at https://mr-nlp.github.io/posts/2021/05/transformer-based-biomedical-pretrained-language-models-list/.


Assuntos
Pesquisa Biomédica , Processamento de Linguagem Natural , Idioma
8.
J Biomed Inform ; 122: 103893, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34481058

RESUMO

Entity relation extraction plays an important role in the biomedical, healthcare, and clinical research areas. Recently, pre-trained models based on transformer architectures and their variants have shown remarkable performances in various natural language processing tasks. Most of these variants were based on slight modifications in the architectural components, representation schemes and augmenting data using distant supervision methods. In distantly supervised methods, one of the main challenges is pruning out noisy samples. A similar situation can arise when the training samples are not directly available but need to be constructed from the given dataset. The BioCreative V Chemical Disease Relation (CDR) task provides a dataset that does not explicitly offer mention-level gold annotations and hence replicates the above scenario. Selecting the representative sentences from the given abstract or document text that could convey a potential entity relationship becomes essential. Most of the existing methods in literature propose to either consider the entire text or all the sentences which contain the entity mentions. This could be a computationally expensive and time consuming approach. This paper presents a novel approach to handle such scenarios, specifically in biomedical relation extraction. We propose utilizing the Shortest Dependency Path (SDP) features for constructing data samples by pruning out noisy information and selecting the most representative samples for model learning. We also utilize triplet information in model learning using the biomedical variant of BERT, viz., BioBERT. The problem is represented as a sentence pair classification task using the sentence and the entity-relation pair as input. We analyze the approach on both intra-sentential and inter-sentential relations in the CDR dataset. The proposed approach that utilizes the SDP and triplet features presents promising results, specifically on the inter-sentential relation extraction task. We make the code used for this work publicly available on Github.1.


Assuntos
Processamento de Linguagem Natural , Projetos de Pesquisa , Idioma
9.
J Theor Biol ; 488: 110112, 2020 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-31883441

RESUMO

Extracting biological relations from biomedical literature can deliver personalized treatment to individual patients based on their genomic profiles. In this paper, we present a novel sentence-level attention-based deep neural network to predict the semantic relationship between medical entities. We utilize a transfer learning based paradigm which considerably improves the prediction performance. The main distinction of the proposed approach is that it relies solely on sentence information, putting aside handcrafted biomedical features. Sentence information is transformed into embedding vectors and improved by the pre-trained embedding models trained on PubMed and PMC papers. Extensive evaluations show that the proposed approach achieves a competitive performance in comparison with the state-of-the-art methods, while do not require any domain-specific biomedical feature. The evaluation data and resources are available at https://github.com/EsmaeilNourani/Deep-GDAE/.


Assuntos
Redes Neurais de Computação , Publicações , Humanos , Aprendizado de Máquina , PubMed , Projetos de Pesquisa
10.
J Biomed Inform ; 106: 103451, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32454243

RESUMO

Drug-drug interactions (DDIs) extraction is one of the important tasks in the field of biomedical relation extraction, which plays an important role in the field of pharmacovigilance. Previous neural network based models have achieved good performance in DDIs extraction. However, most of the previous models did not make good use of the information of drug entity names, which can help to judge the relation between drugs. This is mainly because drug names are often very complex, leading to the fact that neural network models cannot understand their semantics directly. To address this issue, we propose a DDIs extraction model using multiple entity-aware attentions with various entity information. We use an output-modified bidirectional transformer (BioBERT) and a bidirectional gated recurrent unit layer (BiGRU) to obtain the vector representation of sentences. The vectors of drug description documents encoded by Doc2Vec are used as drug description information, which is an external knowledge to our model. Then we construct three different kinds of entity-aware attentions to get the sentence representations with entity information weighted, including attentions using the drug description information. The outputs of attention layers are concatenated and fed into a multi-layer perception layer. Finally, we get the result by a softmax classifier. The F-score is used to evaluate our model, which is also adopted by most previous DDIs extraction models. We evaluate our proposed model on the DDIExtraction 2013 corpus, which is the benchmark corpus of this domain, and achieves the state-of-the-art result (80.9% in F-score).


Assuntos
Redes Neurais de Computação , Preparações Farmacêuticas , Atenção , Interações Medicamentosas , Semântica
11.
Cardiorenal Med ; 14(1): 307-319, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38740015

RESUMO

INTRODUCTION: Cardiorenal syndrome encompasses a range of disorders involving both the heart and kidneys, wherein dysfunction in one organ may induce dysfunction in the other, either acutely or chronically. METHODS: This study conducted a literature search on cardiorenal syndrome from January 1, 2003, to September 8, 2023. Meanwhile, a quantitative analysis of the developmental trajectory, research hotspots and evolutionary trends in the field of cardiorenal syndrome through bibliometric analysis and knowledge mapping was carried out. RESULTS: The annual publication trend analysis revealed a consistent annual increase in cardiorenal syndrome literature over the last 20 years. The IL6, REN, and INS genes were identified as the current research hotspots. CONCLUSION: The field of cardiorenal syndrome exhibits promising potential to grow and is emerging as a prominent research area. Future endeavours should prioritise a comprehensive understanding of the field and foster multi-centre co-operation among different countries and regions.


Assuntos
Bibliometria , Síndrome Cardiorrenal , Síndrome Cardiorrenal/fisiopatologia , Humanos , Pesquisa Biomédica/tendências
12.
Heliyon ; 10(11): e32279, 2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38912449

RESUMO

Early cancer detection and treatment depend on the discovery of specific genes that cause cancer. The classification of genetic mutations was initially done manually. However, this process relies on pathologists and can be a time-consuming task. Therefore, to improve the precision of clinical interpretation, researchers have developed computational algorithms that leverage next-generation sequencing technologies for automated mutation analysis. This paper utilized four deep learning classification models with training collections of biomedical texts. These models comprise bidirectional encoder representations from transformers for Biomedical text mining (BioBERT), a specialized language model implemented for biological contexts. Impressive results in multiple tasks, including text classification, language inference, and question answering, can be obtained by simply adding an extra layer to the BioBERT model. Moreover, bidirectional encoder representations from transformers (BERT), long short-term memory (LSTM), and bidirectional LSTM (BiLSTM) have been leveraged to produce very good results in categorizing genetic mutations based on textual evidence. The dataset used in the work was created by Memorial Sloan Kettering Cancer Center (MSKCC), which contains several mutations. Furthermore, this dataset poses a major classification challenge in the Kaggle research prediction competitions. In carrying out the work, three challenges were identified: enormous text length, biased representation of the data, and repeated data instances. Based on the commonly used evaluation metrics, the experimental results show that the BioBERT model outperforms other models with an F1 score of 0.87 and 0.850 MCC, which can be considered as improved performance compared to similar results in the literature that have an F1 score of 0.70 achieved with the BERT model.

13.
Artif Intell Med ; 151: 102848, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38658132

RESUMO

Medical Knowledge Graphs (MKGs) are vital in propelling big data technologies in healthcare and facilitating the realization of medical intelligence. However, large-scale MKGs often exhibit characteristics of data sparsity and missing facts. Following the latest advances, knowledge embedding addresses these problems by performing knowledge graph completion. Most knowledge embedding algorithms rely solely on triplet structural information, overlooking the rich information hidden within entity property sets, leading to bottlenecks in performance enhancement when dealing with the intricate relations of MKGs. Inspired by the semantic sensitivity and explicit type constraints unique to the medical domain, we propose BioBERT-based graph embedding model. This model represents an evolvable framework that integrates graph embedding, language embedding, and type information, thereby optimizing the utility of MKGs. Our study utilizes not only WordNet as a benchmark dataset but also incorporates MedicalKG to compare and corroborate the specificity of medical knowledge. Experimental results on these datasets indicate that the proposed fusion framework achieves state-of-art (SOTA) performance compared to other baselines. We believe that this incremental improvement provides promising insights for future medical knowledge graph completion endeavors.


Assuntos
Algoritmos , Humanos , Inteligência Artificial , Semântica , Big Data
14.
Syst Rev ; 13(1): 107, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622611

RESUMO

BACKGROUND: Abstract review is a time and labor-consuming step in the systematic and scoping literature review in medicine. Text mining methods, typically natural language processing (NLP), may efficiently replace manual abstract screening. This study applies NLP to a deliberately selected literature review problem, the trend of using NLP in medical research, to demonstrate the performance of this automated abstract review model. METHODS: Scanning PubMed, Embase, PsycINFO, and CINAHL databases, we identified 22,294 with a final selection of 12,817 English abstracts published between 2000 and 2021. We invented a manual classification of medical fields, three variables, i.e., the context of use (COU), text source (TS), and primary research field (PRF). A training dataset was developed after reviewing 485 abstracts. We used a language model called Bidirectional Encoder Representations from Transformers to classify the abstracts. To evaluate the performance of the trained models, we report a micro f1-score and accuracy. RESULTS: The trained models' micro f1-score for classifying abstracts, into three variables were 77.35% for COU, 76.24% for TS, and 85.64% for PRF. The average annual growth rate (AAGR) of the publications was 20.99% between 2000 and 2020 (72.01 articles (95% CI: 56.80-78.30) yearly increase), with 81.76% of the abstracts published between 2010 and 2020. Studies on neoplasms constituted 27.66% of the entire corpus with an AAGR of 42.41%, followed by studies on mental conditions (AAGR = 39.28%). While electronic health or medical records comprised the highest proportion of text sources (57.12%), omics databases had the highest growth among all text sources with an AAGR of 65.08%. The most common NLP application was clinical decision support (25.45%). CONCLUSIONS: BioBERT showed an acceptable performance in the abstract review. If future research shows the high performance of this language model, it can reliably replace manual abstract reviews.


Assuntos
Pesquisa Biomédica , Processamento de Linguagem Natural , Humanos , Idioma , Mineração de Dados , Registros Eletrônicos de Saúde
15.
Sci Rep ; 14(1): 7697, 2024 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-38565624

RESUMO

The rapid increase in biomedical publications necessitates efficient systems to automatically handle Biomedical Named Entity Recognition (BioNER) tasks in unstructured text. However, accurately detecting biomedical entities is quite challenging due to the complexity of their names and the frequent use of abbreviations. In this paper, we propose BioBBC, a deep learning (DL) model that utilizes multi-feature embeddings and is constructed based on the BERT-BiLSTM-CRF to address the BioNER task. BioBBC consists of three main layers; an embedding layer, a Long Short-Term Memory (Bi-LSTM) layer, and a Conditional Random Fields (CRF) layer. BioBBC takes sentences from the biomedical domain as input and identifies the biomedical entities mentioned within the text. The embedding layer generates enriched contextual representation vectors of the input by learning the text through four types of embeddings: part-of-speech tags (POS tags) embedding, char-level embedding, BERT embedding, and data-specific embedding. The BiLSTM layer produces additional syntactic and semantic feature representations. Finally, the CRF layer identifies the best possible tag sequence for the input sentence. Our model is well-constructed and well-optimized for detecting different types of biomedical entities. Based on experimental results, our model outperformed state-of-the-art (SOTA) models with significant improvements based on six benchmark BioNER datasets.


Assuntos
Idioma , Semântica , Processamento de Linguagem Natural , Benchmarking , Fala
16.
Eur Heart J Digit Health ; 5(3): 229-234, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38774372

RESUMO

Aims: ICD codes are used for classification of hospitalizations. The codes are used for administrative, financial, and research purposes. It is known, however, that errors occur. Natural language processing (NLP) offers promising solutions for optimizing the process. To investigate methods for automatic classification of disease in unstructured medical records using NLP and to compare these to conventional ICD coding. Methods and results: Two datasets were used: the open-source Medical Information Mart for Intensive Care (MIMIC)-III dataset (n = 55.177) and a dataset from a hospital in Belgium (n = 12.706). Automated searches using NLP algorithms were performed for the diagnoses 'atrial fibrillation (AF)' and 'heart failure (HF)'. Four methods were used: rule-based search, logistic regression, term frequency-inverse document frequency (TF-IDF), Extreme Gradient Boosting (XGBoost), and Bio-Bidirectional Encoder Representations from Transformers (BioBERT). All algorithms were developed on the MIMIC-III dataset. The best performing algorithm was then deployed on the Belgian dataset. After preprocessing a total of 1438 reports was retained in the Belgian dataset. XGBoost on TF-IDF matrix resulted in an accuracy of 0.94 and 0.92 for AF and HF, respectively. There were 211 mismatches between algorithm and ICD codes. One hundred and three were due to a difference in data availability or differing definitions. In the remaining 108 mismatches, 70% were due to incorrect labelling by the algorithm and 30% were due to erroneous ICD coding (2% of total hospitalizations). Conclusion: A newly developed NLP algorithm attained a high accuracy for classifying disease in medical records. XGBoost outperformed the deep learning technique BioBERT. NLP algorithms could be used to identify ICD-coding errors and optimize and support the ICD-coding process.

17.
Pharmaceutics ; 15(7)2023 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-37514010

RESUMO

Drug-Drug Interactions (DDIs) produce essential and valuable insights for healthcare professionals, since they provide data on the impact of concurrent administration of medications to patients during therapy. In that sense, some relevant works, related to the DDIExtraction2013 Challenge, are available in the current technical literature. This study aims to improve previous results, using two models, where a Gaussian noise layer is added to achieve better DDI relationship extraction. (1) A Piecewise Convolutional Neural Network (PW-CNN) model is used to capture relationships among pharmacological entities described in biomedical databases. Additionally, the model incorporates multichannel words to enrich a person's vocabulary and reduce unfamiliar words. (2) The model uses the pre-trained BERT language model to classify relationships, while also integrating data from the target entities. After identifying the target entities, the model transfers the relevant information through the pre-trained architecture and integrates the encoded data for both entities. The results of the experiment show an improved performance, with respect to previous models.

18.
Technol Health Care ; 31(S1): 111-121, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37038786

RESUMO

BACKGROUND: With the exponential increase in the volume of biomedical literature, text mining tasks are becoming increasingly important in the medical domain. Named entities are the primary identification tasks in text mining, prerequisites and critical parts for building medical domain knowledge graphs, medical question and answer systems, medical text classification. OBJECTIVE: The study goal is to recognize biomedical entities effectively by fusing multi-feature embedding. Multiple features provide more comprehensive information so that better predictions can be obtained. METHODS: Firstly, three different kinds of features are generated, including deep contextual word-level features, local char-level features, and part-of-speech features at the word representation layer. The word representation vectors are inputs into BiLSTM as features to obtain the dependency information. Finally, the CRF algorithm is used to learn the features of the state sequences to obtain the global optimal tagging sequences. RESULTS: The experimental results showed that the model outperformed other state-of-the-art methods for all-around performance in six datasets among eight of four biomedical entity types. CONCLUSION: The proposed method has a positive effect on the prediction results. It comprehensively considers the relevant factors of named entity recognition because the semantic information is enhanced by fusing multi-features embedding.


Assuntos
Algoritmos , Neoplasias Cutâneas , Humanos , Mineração de Dados , Aprendizagem , Fala
19.
Front Genet ; 13: 799349, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35571049

RESUMO

Recent advances have witnessed a growth of herbalism studies adopting a modern scientific approach in molecular medicine, offering valuable domain knowledge that can potentially boost the development of herbalism with evidence-supported efficacy and safety. However, these domain-specific scientific findings have not been systematically organized, affecting the efficiency of knowledge discovery and usage. Existing knowledge graphs in herbalism mainly focus on diagnosis and treatment with an absence of knowledge connection with molecular medicine. To fill this gap, we present HerbKG, a knowledge graph that bridges herbal and molecular medicine. The core bio-entities of HerbKG include herbs, chemicals extracted from the herbs, genes that are affected by the chemicals, and diseases treated by herbs due to the functions of genes. We have developed a learning framework to automate the process of HerbKG construction. The resulting HerbKG, after analyzing over 500K PubMed abstracts, is populated with 53K relations, providing extensive herbal-molecular domain knowledge in support of downstream applications. The code and an interactive tool are available at https://github.com/FeiYee/HerbKG.

20.
Methods Mol Biol ; 2496: 221-235, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35713867

RESUMO

In biomedicine, facts about relations between entities (disease, gene, drug, etc.) are hidden in the large trove of 30 million scientific publications. The curated information is proven to play an important role in various applications such as drug repurposing and precision medicine. Recently, due to the advancement in deep learning a transformer architecture named BERT (Bidirectional Encoder Representations from Transformers) has been proposed. This pretrained language model trained using the Books Corpus with 800M words and English Wikipedia with 2500M words reported state of the art results in various NLP (Natural Language Processing) tasks including relation extraction. It is a widely accepted notion that due to the word distribution shift, general domain models exhibit poor performance in information extraction tasks of the biomedical domain. Due to this, an architecture is later adapted to the biomedical domain by training the language models using 28 million scientific literatures from PubMed and PubMed central. This chapter presents a protocol for relation extraction using BERT by discussing state-of-the-art for BERT versions in the biomedical domain such as BioBERT. The protocol emphasis on general BERT architecture, pretraining and fine tuning, leveraging biomedical information, and finally a knowledge graph infusion to the BERT model layer.


Assuntos
Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Idioma , PubMed , Publicações
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA