Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Bioinformatics ; 24(1): 265, 2023 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-37365501

RESUMO

BACKGROUND: Unlike diseases, automatic recognition of disabilities has not received the same attention in the area of medical NLP. Progress in this direction is hampered by obstacles like the lack of annotated corpus. Neural architectures learn to translate sequences from spontaneous representations into their corresponding standard representations given a set of samples. The aim of this paper is to present the last advances in monolingual (Spanish) and crosslingual (from English to Spanish and vice versa) automatic disability annotation. The task consists of identifying disability mentions in medical texts written in Spanish within a collection of abstracts from journal papers related to the biomedical domain. RESULTS: In order to carry out the task, we have combined deep learning models that use different embedding granularities for sequence to sequence tagging with a simple acronym and abbreviation detection module to boost the coverage. CONCLUSIONS: Our monolingual experiments demonstrate that a good combination of different word embedding representations provide better results than single representations, significantly outperforming the state of the art in disability annotation in Spanish. Additionally, we have experimented crosslingual transfer (zero-shot) for disability annotation between English and Spanish with interesting results that might help overcoming the data scarcity bottleneck, specially significant for the disabilities.


Assuntos
Redes Neurais de Computação , Redação , Processamento de Linguagem Natural
2.
J Biomed Inform ; 145: 104461, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37536643

RESUMO

BACKGROUND: Electronic Clinical Narratives (ECNs) store valuable individual's health information. However, there are few available open-source data. Besides, ECNs can be structurally heterogeneous, ranging from documents with explicit section headings or titles to unstructured notes. This lack of structure complicates building automatic systems and their evaluation. OBJECTIVE: The aim of the present work is to provide the scientific community with a Spanish open-source dataset to build and evaluate automatic section identification systems. Together with this dataset, the purpose is to design and implement a suitable evaluation measure and a fine-tuned language model adapted to the task. MATERIALS AND METHODS: A corpus of unstructured clinical records, in this case progress notes written in Spanish, was annotated with seven major section types. Existing metrics for the presented task were thoroughly assessed and, based on the most suitable one, we defined a new B2 metric better tailored given the task. RESULTS: The annotated corpus, as well as the designed new evaluation script and a baseline model are freely available for the community. This model reaches an average B2 score of 71.3 on our open source dataset and an average B2 of 67.0 in data scarcity scenarios where the target corpus and its structure differs from the dataset used for training the LM. CONCLUSION: Although section identification in unstructured clinical narratives is challenging, this work shows that it is possible to build competitive automatic systems when both data and the right evaluation metrics are available. The annotated data, the implemented evaluation scripts, and the section identification Language Model are open-sourced hoping that this contribution will foster the building of more and better systems.


Assuntos
Registros Eletrônicos de Saúde , Idioma , Processamento de Linguagem Natural
3.
J Biomed Inform ; 121: 103875, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34325020

RESUMO

BACKGROUND: Nowadays, with the digitalization of healthcare systems, huge amounts of clinical narratives are available. However, despite the wealth of information contained in them, interoperability and extraction of relevant information from documents remains a challenge. OBJECTIVE: This work presents an approach towards automatically standardizing Spanish Electronic Discharge Summaries (EDS) following the HL7 Clinical Document Architecture. We address the task of section annotation in EDSs written in Spanish, experimenting with three different approaches, with the aim of boosting interoperability across healthcare systems and hospitals. METHODS: The paper presents three different methods, ranging from a knowledge-based solution by means of manually constructed rules to supervised Machine Learning approaches, using state of the art algorithms like the Perceptron and transfer learning-based Neural Networks. RESULTS: The paper presents a detailed evaluation of the three approaches on two different hospitals. Overall, the best system obtains a 93.03% F-score for section identification. It is worth mentioning that this result is not completely homogeneous over all section types and hospitals, showing that cross-hospital variability in certain sections is bigger than in others. CONCLUSIONS: As a main result, this work proves the feasibility of accurate automatic detection and standardization of section blocks in clinical narratives, opening the way to interoperability and secondary use of clinical data.


Assuntos
Registros Eletrônicos de Saúde , Sumários de Alta do Paciente Hospitalar , Algoritmos , Redes Neurais de Computação , Padrões de Referência
4.
J Biomed Inform ; 71: 16-30, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28526460

RESUMO

OBJECTIVE: The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations. METHODS: The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space. RESULTS: The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces. CONCLUSIONS: The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.


Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina , Semântica , Análise por Conglomerados , Curadoria de Dados , Humanos , Suécia
5.
J Biomed Inform ; 56: 318-32, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26141794

RESUMO

The advances achieved in Natural Language Processing make it possible to automatically mine information from electronically created documents. Many Natural Language Processing methods that extract information from texts make use of annotated corpora, but these are scarce in the clinical domain due to legal and ethical issues. In this paper we present the creation of the IxaMed-GS gold standard composed of real electronic health records written in Spanish and manually annotated by experts in pharmacology and pharmacovigilance. The experts mainly annotated entities related to diseases and drugs, but also relationships between entities indicating adverse drug reaction events. To help the experts in the annotation task, we adapted a general corpus linguistic analyzer to the medical domain. The quality of the annotation process in the IxaMed-GS corpus has been assessed by measuring the inter-annotator agreement, which was 90.53% for entities and 82.86% for events. In addition, the corpus has been used for the automatic extraction of adverse drug reaction events using machine learning.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Mineração de Dados/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde/normas , Processamento de Linguagem Natural , Algoritmos , Automação , Idioma , Linguística , Aprendizado de Máquina , Preparações Farmacêuticas , Farmacovigilância , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Tradução
6.
Int J Med Inform ; 129: 49-59, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31445289

RESUMO

BACKGROUND: Automatic extraction of morbid disease or conditions contained in Death Certificates is a critical process, useful for billing, epidemiological studies and comparison across countries. The fact that these clinical documents are written in regular natural language makes the automatic coding process difficult because, often, spontaneous terms diverge strongly from standard reference terminology such as the International Classification of Diseases (ICD). OBJECTIVE: Our aim is to propose a general and multilingual approach to render Diagnostic Terms into the standard framework provided by the ICD. We have evaluated our proposal on a set of clinical texts written in French, Hungarian and Italian. METHODS: ICD-10 encoding is a multi-class classification problem with an extensive (thousands) number of classes. After considering several approaches, we tackle our objective as a sequence-to-sequence task. According to current trends, we opted to use neural networks. We tested different types of neural architectures on three datasets in which Diagnostic Terms (DTs) have their ICD-10 codes associated. RESULTS AND CONCLUSIONS: Our results give a new state-of-the art on multilingual ICD-10 coding, outperforming several alternative approaches, and showing the feasibility of automatic ICD-10 prediction obtaining an F-measure of 0.838, 0.963 and 0.952 for French, Hungarian and Italian, respectively. Additionally, the results are interpretable, providing experts with supporting evidence when confronted with coding decisions, as the model is able to show the alignments between the original text and each output code.


Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Redes Neurais de Computação
7.
Comput Methods Programs Biomed ; 164: 111-119, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30195419

RESUMO

BACKGROUND AND OBJECTIVES: Electronic health records (EHRs) convey vast and valuable knowledge about dynamically changing clinical practices. Indeed, clinical documentation entails the inspection of massive number of records across hospitals and hospital sections. The goal of this study is to provide an efficient framework that will help clinicians explore EHRs and attain alternative views related to both patient-segments and diseases, like clustering and statistical information about the development of heart diseases (replacement of pacemakers, valve implantation etc.) in co-occurrence with other diseases. The task is challenging, dealing with lengthy health records and a high number of classes in a multi-label setting. METHODS: LDA is a statistical procedure optimized to explain a document by multinomial distributions on their latent topics and the topics by distributions on related words. These distributions allow to represent collections of texts into a continuous space enabling distance-based associations between documents and also revealing the underlying topics. The topic models were assessed by means of four divergence metrics. In addition, we applied LDA to the task of multi-label document classification of EHRs according to the International Classification of Diseases 10th Clinical Modification (ICD-10). The set of EHRs had assigned 7 codes on average over 970 different codes corresponding to cardiology. RESULTS: First, the discriminative ability of topic models was assessed using dissimilarity metrics. Nevertheless, there was an open question regarding the interpretability of automatically discovered topics. To address this issue, we explored the connection between the latent topics and ICD-10. EHRs were represented by means of LDA and, next, supervised classifiers were inferred from those representations. Given the low-dimensional representation provided by LDA, the search was computationally efficient compared to symbolic approaches such as TF-IDF. The classifiers achieved an average AUC of 77.79. As a side contribution, with this work we released the software implemented in Python and R to both train and evaluate the models. CONCLUSIONS: Topic modeling offers a means of representing EHRs in a small dimensional continuous space. This representation conveys relevant information as hidden topics in a comprehensive manner. Moreover, in practice, this compact representation allowed to extract the ICD-10 codes associated to EHRs.


Assuntos
Cardiologia/estatística & dados numéricos , Registros Eletrônicos de Saúde/classificação , Cardiologia/tendências , Mineração de Dados , Registros Eletrônicos de Saúde/estatística & dados numéricos , Humanos , Classificação Internacional de Doenças , Modelos Estatísticos
8.
Int J Med Inform ; 110: 111-117, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29331250

RESUMO

BACKGROUND: Electronic Health Records (EHRs) are written using spontaneous natural language. Often, terms do not match standard terminology like the one available through the International Classification of Diseases (ICD). OBJECTIVE: Information retrieval and exchange can be improved using standard terminology. Our aim is to render diagnostic terms written in spontaneous language in EHRs into the standard framework provided by the ICD. METHODS: We tackle diagnostic term normalization employing Weighted Finite-State Transducers (WFSTs). These machines learn how to translate sequences, in the case of our concern, spontaneous representations into standard representations given a set of samples. They are highly flexible and easily adaptable to terminological singularities of each different hospital and practitioner. Besides, we implemented a similarity metric to enhance spontaneous-standard term matching. RESULTS: From the 2850 spontaneous DTs randomly selected we found that only 7.71% were written in their standard form matching the ICD. This WFST-based system enabled matching spontaneous ICDs with a Mean Reciprocal Rank of 0.68, which means that, on average, the right ICD code is found between the first and second position among the normalized set of candidates. This guarantees efficient document exchange and, furthermore, information retrieval. CONCLUSION: Medical term normalization was achieved with high performance. We found that direct matching of spontaneous terms using standard lexicons leads to unsatisfactory results while normalized hypothesis generation by means of WFST helped to overcome the gap between spontaneous and standard language.


Assuntos
Registros Eletrônicos de Saúde/normas , Armazenamento e Recuperação da Informação/normas , Classificação Internacional de Doenças/normas , Terminologia como Assunto , Humanos , Aplicações da Informática Médica , Processamento de Linguagem Natural
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA