Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
BMC Bioinformatics ; 24(1): 265, 2023 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-37365501

RESUMO

BACKGROUND: Unlike diseases, automatic recognition of disabilities has not received the same attention in the area of medical NLP. Progress in this direction is hampered by obstacles like the lack of annotated corpus. Neural architectures learn to translate sequences from spontaneous representations into their corresponding standard representations given a set of samples. The aim of this paper is to present the last advances in monolingual (Spanish) and crosslingual (from English to Spanish and vice versa) automatic disability annotation. The task consists of identifying disability mentions in medical texts written in Spanish within a collection of abstracts from journal papers related to the biomedical domain. RESULTS: In order to carry out the task, we have combined deep learning models that use different embedding granularities for sequence to sequence tagging with a simple acronym and abbreviation detection module to boost the coverage. CONCLUSIONS: Our monolingual experiments demonstrate that a good combination of different word embedding representations provide better results than single representations, significantly outperforming the state of the art in disability annotation in Spanish. Additionally, we have experimented crosslingual transfer (zero-shot) for disability annotation between English and Spanish with interesting results that might help overcoming the data scarcity bottleneck, specially significant for the disabilities.


Assuntos
Redes Neurais de Computação , Redação , Processamento de Linguagem Natural
2.
J Biomed Inform ; 145: 104461, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37536643

RESUMO

BACKGROUND: Electronic Clinical Narratives (ECNs) store valuable individual's health information. However, there are few available open-source data. Besides, ECNs can be structurally heterogeneous, ranging from documents with explicit section headings or titles to unstructured notes. This lack of structure complicates building automatic systems and their evaluation. OBJECTIVE: The aim of the present work is to provide the scientific community with a Spanish open-source dataset to build and evaluate automatic section identification systems. Together with this dataset, the purpose is to design and implement a suitable evaluation measure and a fine-tuned language model adapted to the task. MATERIALS AND METHODS: A corpus of unstructured clinical records, in this case progress notes written in Spanish, was annotated with seven major section types. Existing metrics for the presented task were thoroughly assessed and, based on the most suitable one, we defined a new B2 metric better tailored given the task. RESULTS: The annotated corpus, as well as the designed new evaluation script and a baseline model are freely available for the community. This model reaches an average B2 score of 71.3 on our open source dataset and an average B2 of 67.0 in data scarcity scenarios where the target corpus and its structure differs from the dataset used for training the LM. CONCLUSION: Although section identification in unstructured clinical narratives is challenging, this work shows that it is possible to build competitive automatic systems when both data and the right evaluation metrics are available. The annotated data, the implemented evaluation scripts, and the section identification Language Model are open-sourced hoping that this contribution will foster the building of more and better systems.


Assuntos
Registros Eletrônicos de Saúde , Idioma , Processamento de Linguagem Natural
3.
J Biomed Inform ; 121: 103875, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34325020

RESUMO

BACKGROUND: Nowadays, with the digitalization of healthcare systems, huge amounts of clinical narratives are available. However, despite the wealth of information contained in them, interoperability and extraction of relevant information from documents remains a challenge. OBJECTIVE: This work presents an approach towards automatically standardizing Spanish Electronic Discharge Summaries (EDS) following the HL7 Clinical Document Architecture. We address the task of section annotation in EDSs written in Spanish, experimenting with three different approaches, with the aim of boosting interoperability across healthcare systems and hospitals. METHODS: The paper presents three different methods, ranging from a knowledge-based solution by means of manually constructed rules to supervised Machine Learning approaches, using state of the art algorithms like the Perceptron and transfer learning-based Neural Networks. RESULTS: The paper presents a detailed evaluation of the three approaches on two different hospitals. Overall, the best system obtains a 93.03% F-score for section identification. It is worth mentioning that this result is not completely homogeneous over all section types and hospitals, showing that cross-hospital variability in certain sections is bigger than in others. CONCLUSIONS: As a main result, this work proves the feasibility of accurate automatic detection and standardization of section blocks in clinical narratives, opening the way to interoperability and secondary use of clinical data.


Assuntos
Registros Eletrônicos de Saúde , Sumários de Alta do Paciente Hospitalar , Algoritmos , Redes Neurais de Computação , Padrões de Referência
4.
Artif Intell Med ; 157: 102985, 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39383708

RESUMO

Developing technology to assist medical experts in their everyday decision-making is currently a hot topic in the field of Artificial Intelligence (AI). This is specially true within the framework of Evidence-Based Medicine (EBM), where the aim is to facilitate the extraction of relevant information using natural language as a tool for mediating in human-AI interaction. In this context, AI techniques can be beneficial in finding arguments for past decisions in evolution notes or patient journeys, especially when different doctors are involved in a patient's care. In those documents the decision-making process towards treating the patient is reported. Thus, applying Natural Language Processing (NLP) techniques has the potential to assist doctors in extracting arguments for a more comprehensive understanding of the decisions made. This work focuses on the explanatory argument identification step by setting up the task in a Question Answering (QA) scenario in which clinicians ask questions to the AI model to assist them in identifying those arguments. In order to explore the capabilities of current AI-based language models, we present a new dataset which, unlike previous work: (i) includes not only explanatory arguments for the correct hypothesis, but also arguments to reason on the incorrectness of other hypotheses; (ii) the explanations are written originally in Spanish by doctors to reason over cases from the Spanish Residency Medical Exams. Furthermore, this new benchmark allows us to set up a novel extractive task by identifying the explanation written by medical doctors that supports the correct answer within an argumentative text. An additional benefit of our approach lies in its ability to evaluate the extractive performance of language models using automatic metrics, which in the Antidote CasiMedicos dataset corresponds to a 74.47 F1 score. Comprehensive experimentation shows that our novel dataset and approach is an effective technique to help practitioners in identifying relevant evidence-based explanations for medical questions.

5.
PLoS One ; 14(9): e0221639, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31483814

RESUMO

Lately, discourse structure has received considerable attention due to the benefits its application offers in several NLP tasks such as opinion mining, summarization, question answering, text simplification, among others. When automatically analyzing texts, discourse parsers typically perform two different tasks: i) identification of basic discourse units (text segmentation) ii) linking discourse units by means of discourse relations, building structures such as trees or graphs. The resulting discourse structures are, in general terms, accurate at intra-sentence discourse-level relations, however they fail to capture the correct inter-sentence relations. Detecting the main discourse unit (the Central Unit) is helpful for discourse analyzers (and also for manual annotation) in improving their results in rhetorical labeling. Bearing this in mind, we set out to build the first two steps of a discourse parser following a top-down strategy: i) to find discourse units, ii) to detect the Central Unit. The final step, i.e. assigning rhetorical relations, remains to be worked on in the immediate future. In accordance with this strategy, our paper presents a tool consisting of a discourse segmenter and an automatic Central Unit detector.


Assuntos
Processamento de Linguagem Natural , Automação
6.
Int J Med Inform ; 129: 49-59, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31445289

RESUMO

BACKGROUND: Automatic extraction of morbid disease or conditions contained in Death Certificates is a critical process, useful for billing, epidemiological studies and comparison across countries. The fact that these clinical documents are written in regular natural language makes the automatic coding process difficult because, often, spontaneous terms diverge strongly from standard reference terminology such as the International Classification of Diseases (ICD). OBJECTIVE: Our aim is to propose a general and multilingual approach to render Diagnostic Terms into the standard framework provided by the ICD. We have evaluated our proposal on a set of clinical texts written in French, Hungarian and Italian. METHODS: ICD-10 encoding is a multi-class classification problem with an extensive (thousands) number of classes. After considering several approaches, we tackle our objective as a sequence-to-sequence task. According to current trends, we opted to use neural networks. We tested different types of neural architectures on three datasets in which Diagnostic Terms (DTs) have their ICD-10 codes associated. RESULTS AND CONCLUSIONS: Our results give a new state-of-the art on multilingual ICD-10 coding, outperforming several alternative approaches, and showing the feasibility of automatic ICD-10 prediction obtaining an F-measure of 0.838, 0.963 and 0.952 for French, Hungarian and Italian, respectively. Additionally, the results are interpretable, providing experts with supporting evidence when confronted with coding decisions, as the model is able to show the alignments between the original text and each output code.


Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Redes Neurais de Computação
7.
IEEE J Biomed Health Inform ; 22(4): 1323-1329, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-28858819

RESUMO

This work focuses on data mining applied to the clinical documentation domain. Diagnostic terms (DTs) are used as keywords to retrieve valuable information from electronic health records. Indeed, they are encoded manually by experts following the International Classification of Diseases (ICD). The goal of this work is to explore the aid of text mining on DT encoding. From the machine learning (ML) perspective, this is a high-dimensional classification task, as it comprises thousands of codes. This work delves into a robust representation of the instances to improve ML results. The proposed system is able to find the right ICD code among more than 1500 possible ICD codes with 92% precision for the main disease (primary class) and 88% for the main disease together with the nonessential modifiers (fully specified class). The methodology employed is simple and portable. According to the experts from public hospitals, the system is very useful in particular for documentation and pharmacosurveillance services. In fact, they reported an accuracy of 91.2% on a small randomly extracted test. Hence, together with this paper, we made the software publicly available in order to help the clinical and research community.


Assuntos
Documentação/métodos , Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Aprendizado de Máquina , Mineração de Dados/métodos , Humanos , Processamento de Linguagem Natural
8.
Int J Med Inform ; 110: 111-117, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29331250

RESUMO

BACKGROUND: Electronic Health Records (EHRs) are written using spontaneous natural language. Often, terms do not match standard terminology like the one available through the International Classification of Diseases (ICD). OBJECTIVE: Information retrieval and exchange can be improved using standard terminology. Our aim is to render diagnostic terms written in spontaneous language in EHRs into the standard framework provided by the ICD. METHODS: We tackle diagnostic term normalization employing Weighted Finite-State Transducers (WFSTs). These machines learn how to translate sequences, in the case of our concern, spontaneous representations into standard representations given a set of samples. They are highly flexible and easily adaptable to terminological singularities of each different hospital and practitioner. Besides, we implemented a similarity metric to enhance spontaneous-standard term matching. RESULTS: From the 2850 spontaneous DTs randomly selected we found that only 7.71% were written in their standard form matching the ICD. This WFST-based system enabled matching spontaneous ICDs with a Mean Reciprocal Rank of 0.68, which means that, on average, the right ICD code is found between the first and second position among the normalized set of candidates. This guarantees efficient document exchange and, furthermore, information retrieval. CONCLUSION: Medical term normalization was achieved with high performance. We found that direct matching of spontaneous terms using standard lexicons leads to unsatisfactory results while normalized hypothesis generation by means of WFST helped to overcome the gap between spontaneous and standard language.


Assuntos
Registros Eletrônicos de Saúde/normas , Armazenamento e Recuperação da Informação/normas , Classificação Internacional de Doenças/normas , Terminologia como Assunto , Humanos , Aplicações da Informática Médica , Processamento de Linguagem Natural
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA