Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Bioinformatics ; 23(1): 486, 2022 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-36384464

RESUMO

BACKGROUND: Unstructured text in medical records, such as Electronic Health Records, contain an enormous amount of valuable information for research; however, it is difficult to extract and structure important information because of frequent typographical errors. Therefore, improving the quality of data with errors for text analysis is an essential task. To date, few prior studies have been conducted addressing this. Here, we propose a new methodology for extracting important information from unstructured medical texts by overcoming the typographical problem in surgical pathology records related to lung cancer. METHODS: We propose a typo correction model that considers context, based on the Masked Language Model, to solve the problem of typographical errors in real-world medical data. In addition, a word dictionary was used for the typo correction model based on PubMed abstracts. After refining the data through typo correction, fine tuning was performed on pre-trained BERT model. Next, deep learning-based Named Entity Recognition (NER) was performed. By solving the quality problem of medical data, we sought to improve the accuracy of information extraction in unstructured text data. RESULTS: We compared the performance of the proposed typo correction model based on contextual information with an existing SymSpell model. We confirmed that our proposed model outperformed the existing model in a typographical correction task. The F1-score of the model improved by approximately 5% and 9% when compared with the model without contextual information in the NCBI-disease and surgical pathology record datasets, respectively. In addition, the F1-score of NER after typo correction increased by 2% in the NCBI-disease dataset. There was a significant performance difference of approximately 25% between the before and after typo correction in the Surgical pathology record dataset. This confirmed that typos influenced the information extraction of the unstructured text. CONCLUSION: We verified that typographical errors in unstructured text negatively affect the performance of natural language processing tasks. The proposed method of a typo correction model outperformed the existing SymSpell model. This study shows that the proposed model is robust and can be applied in real-world environments by focusing on the typos that cause difficulties in analyzing unstructured medical text.


Assuntos
Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Idioma , PubMed
2.
BMC Bioinformatics ; 21(Suppl 5): 405, 2020 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-33106157

RESUMO

BACKGROUND: In recent years, Traditional Chinese Medicine (TCM) and alternative medicine have been widely used along with western drugs as a complementary form of treatment. In this study, we first use the scientific literature to identify western drugs with obvious side effects. Then, we find TCM alternatives for these western drugs to ameliorate their side effects. RESULTS: We used depression as a case study. To evaluate our method, we showed the relation between herb-ingredients-target-disease for representative alternative herbs of western drugs. Further, a protein-protein interaction network of western drugs and alternative herbs was produced, and we performed enrichment analysis of the targets of the active ingredients of the herbs and examined the enrichment of Gene Ontology terms for Biological Process, Cellular Component, and Molecular Function and KEGG Pathway levels, to show how these targets affect different levels of gene expression. CONCLUSION: Our proposed method is able to select herbs that are highly relevant to the target indication (depression) and are able to treat the side effects caused by the target drug. The compounds from our selected alternative herbal medicines can therefore be complementary to the western drugs and ameliorate their side effects, which may help in the development of new drugs.


Assuntos
Antidepressivos/efeitos adversos , Medicamentos de Ervas Chinesas/farmacologia , Descoberta do Conhecimento , Medicina Tradicional Chinesa/estatística & dados numéricos , Plantas Medicinais/química , Humanos
3.
BMC Med Inform Decis Mak ; 19(Suppl 5): 240, 2019 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-31801521

RESUMO

BACKGROUND: Extracting useful information from biomedical literature plays an important role in the development of modern medicine. In natural language processing, there have been rigorous attempts to find meaningful relationships between entities automatically by co-occurrence-based methods. It has been increasingly important to understand whether relationships exist, and if so how strong, between any two entities extracted from a large number of texts. One of the defining methods is to measure semantic similarity and relatedness between two entities. METHODS: We propose a hybrid ranking method that combines a co-occurrence approach considering both direct and indirect entity pair relationship with specialized word embeddings for measuring the relatedness of two entities. RESULTS: We evaluate the proposed ranking method comparatively with other well-known methods such as co-occurrence, Word2Vec, COALS (Correlated Occurrence Analog to Lexical Semantics), and random indexing by calculating top-ranked entities related to Alzheimer's disease. In addition, we analyze gene, pathway, and gene-phenotype relationships. Overall, the proposed method tends to find more hidden relationships than the other methods. CONCLUSION: Our proposed method is able to select more useful related entities that not only highly co-occur but also have more indirect relations for the target entity. In pathway analysis, our proposed method shows superior performance at identifying (functional) cross clustering and higher-level pathways. Our proposed method, resulting from phenotype analysis, has an advantage in identifying the common genotype relating to phenotypes from biological literature.


Assuntos
Doença de Alzheimer/complicações , Processamento de Linguagem Natural , Algoritmos , Doença de Alzheimer/epidemiologia , Doença de Alzheimer/genética , Análise por Conglomerados , Humanos , Armazenamento e Recuperação da Informação , Semântica
4.
BMC Bioinformatics ; 18(Suppl 7): 251, 2017 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-28617229

RESUMO

BACKGROUND: Bioinformatics is an interdisciplinary field at the intersection of molecular biology and computing technology. To characterize the field as convergent domain, researchers have used bibliometrics, augmented with text-mining techniques for content analysis. In previous studies, Latent Dirichlet Allocation (LDA) was the most representative topic modeling technique for identifying topic structure of subject areas. However, as opposed to revealing the topic structure in relation to metadata such as authors, publication date, and journals, LDA only displays the simple topic structure. METHODS: In this paper, we adopt the Tang et al.'s Author-Conference-Topic (ACT) model to study the field of bioinformatics from the perspective of keyphrases, authors, and journals. The ACT model is capable of incorporating the paper, author, and conference into the topic distribution simultaneously. To obtain more meaningful results, we use journals and keyphrases instead of conferences and bag-of-words.. For analysis, we use PubMed to collected forty-six bioinformatics journals from the MEDLINE database. We conducted time series topic analysis over four periods from 1996 to 2015 to further examine the interdisciplinary nature of bioinformatics. RESULTS: We analyze the ACT Model results in each period. Additionally, for further integrated analysis, we conduct a time series analysis among the top-ranked keyphrases, journals, and authors according to their frequency. We also examine the patterns in the top journals by simultaneously identifying the topical probability in each period, as well as the top authors and keyphrases. The results indicate that in recent years diversified topics have become more prevalent and convergent topics have become more clearly represented. CONCLUSION: The results of our analysis implies that overtime the field of bioinformatics becomes more interdisciplinary where there is a steady increase in peripheral fields such as conceptual, mathematical, and system biology. These results are confirmed by integrated analysis of topic distribution as well as top ranked keyphrases, authors, and journals.


Assuntos
Biologia Computacional/métodos , Modelos Teóricos , Bibliometria , Mineração de Dados , Bases de Dados Factuais , Humanos
5.
BMC Med Inform Decis Mak ; 16 Suppl 1: 68, 2016 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-27454860

RESUMO

BACKGROUND: The Variome corpus, a small collection of published articles about inherited colorectal cancer, includes annotations of 11 entity types and 13 relation types related to the curation of the relationship between genetic variation and disease. Due to the richness of these annotations, the corpus provides a good testbed for evaluation of biomedical literature information extraction systems. METHODS: In this paper, we focus on assessing performance on extracting the relations in the corpus, using gold standard entities as a starting point, to establish a baseline for extraction of relations important for extraction of genetic variant information from the literature. We test the application of the Public Knowledge Discovery Engine for Java (PKDE4J) system, a natural language processing system designed for information extraction of entities and relations in text, on the relation extraction task using this corpus. RESULTS: For the relations which are attested at least 100 times in the Variome corpus, we realise a performance ranging from 0.78-0.84 Precision-weighted F-score, depending on the relation. We find that the PKDE4J system adapted straightforwardly to the range of relation types represented in the corpus; some extensions to the original methodology were required to adapt to the multi-relational classification context. The results are competitive with state-of-the-art relation extraction performance on more heavily studied corpora, although the analysis shows that the Recall of a co-occurrence baseline outweighs the benefit of improved Precision for many relations, indicating the value of simple semantic constraints on relations. CONCLUSIONS: This work represents the first attempt to apply relation extraction methods to the Variome corpus. The results demonstrate that automated methods have good potential to structure the information expressed in the published literature related to genetic variants, connecting mutations to genes, diseases, and patient cohorts. Further development of such approaches will facilitate more efficient biocuration of genetic variant information into structured databases, leveraging the knowledge embedded in the vast publication literature.


Assuntos
Neoplasias Colorretais/genética , Mineração de Dados/métodos , Bases de Dados Genéticas , Variação Genética/genética , Humanos
6.
J Biomed Inform ; 57: 320-32, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26277115

RESUMO

Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means of information search, knowledge discovery, and hypothesis generation. Most previous studies have primarily focused on the design and performance improvement of either named entity recognition or relation extraction. In this paper, we present PKDE4J, a comprehensive text-mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. Starting with the Stanford CoreNLP, we developed the system to cope with multiple types of entities and relations. The system also has fairly good performance in terms of accuracy as well as the ability to configure text-processing components. We demonstrate its competitive performance by evaluating it on many corpora and found that it surpasses existing systems with average F-measures of 85% for entity extraction and 81% for relation extraction.


Assuntos
Mineração de Dados , Descoberta do Conhecimento , Conhecimento , Publicações Periódicas como Assunto , Publicações
7.
Genes (Basel) ; 10(2)2019 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-30791472

RESUMO

BACKGROUND: Although there are many studies of drugs and their side effects, the underlying mechanisms of these side effects are not well understood. It is also difficult to understand the specific pathways between drugs and side effects. OBJECTIVE: The present study seeks to construct putative paths between drugs and their side effects by applying text-mining techniques to free text of biomedical studies, and to develop ranking metrics that could identify the most-likely paths. MATERIALS AND METHODS: We extracted three types of relationships-drug-protein, protein-protein, and protein⁻side effect-from biomedical texts by using text mining and predefined relation-extraction rules. Based on the extracted relationships, we constructed whole drug-protein⁻side effect paths. For each path, we calculated its ranking score by a new ranking function that combines corpus- and ontology-based semantic similarity as well as co-occurrence frequency. RESULTS: We extracted 13 plausible biomedical paths connecting drugs and their side effects from cancer-related abstracts in the PubMed database. The top 20 paths were examined, and the proposed ranking function outperformed the other methods tested, including co-occurrence, COALS, and UMLS by P@5-P@20. In addition, we confirmed that the paths are novel hypotheses that are worth investigating further. DISCUSSION: The risk of side effects has been an important issue for the US Food and Drug Administration (FDA). However, the causes and mechanisms of such side effects have not been fully elucidated. This study extends previous research on understanding drug side effects by using various techniques such as Named Entity Recognition (NER), Relation Extraction (RE), and semantic similarity. CONCLUSION: It is not easy to reveal the biomedical mechanisms of side effects due to a huge number of possible paths. However, we automatically generated predictable paths using the proposed approach, which could provide meaningful information to biomedical researchers to generate plausible hypotheses for the understanding of such mechanisms.


Assuntos
Algoritmos , Mineração de Dados/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Animais , Humanos , Ligação Proteica , Semântica
8.
Respir Physiol Neurobiol ; 192: 95-101, 2014 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-24361464

RESUMO

Since the imbalance of the nasal cavities due to nasal septal deviation (NSD) is a commonly observed anatomic variation in healthy adults, clinicians must often decide whether or not it is clinically relevant to the symptoms of nasal airway obstruction (NAO). Main reason for this is a lack of data correlating the symptoms of NAO with objective findings. The aim of our study is to find the correlation between fluid dynamic parameters and the anatomy of nasal cavity with NSD by numerical simulation. We generated 6 computational models of nasal cavities with NSD were created from computed tomographic images: 3 symptomatic patients with NAO and 3 asymptomatic patients. Computational fluid dynamics (CFD) was used to simulate steady inspiratory airflows in each nasal cavity model and compare the fluid dynamic properties of each. In the symptomatic cases, the pressure drop from the naris to the end of the septum was larger, and more uneven flow partitioning was observed. Local maximum velocity and wall shear stress were higher in the symptomatic group than in the asymptomatic group. The symptoms of NAO seem to be related more to the nasal resistance from the naris to the end of the septum than to the total nasal resistance from naris to nasopharynx. Factors correlated with NAO by CFD can be used as elements in patient-specific objective diagnostic tools for NAO in the presence of NSD.


Assuntos
Hidrodinâmica , Cavidade Nasal/fisiopatologia , Obstrução Nasal/patologia , Obstrução Nasal/fisiopatologia , Septo Nasal/fisiopatologia , Simulação por Computador , Lateralidade Funcional , Humanos , Modelos Biológicos , Cavidade Nasal/diagnóstico por imagem , Septo Nasal/diagnóstico por imagem , Estatística como Assunto , Estresse Fisiológico , Tomografia Computadorizada por Raios X
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA