Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
PLoS One ; 19(7): e0305362, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38976665

RESUMEN

Disinformation in the medical field is a growing problem that carries a significant risk. Therefore, it is crucial to detect and combat it effectively. In this article, we provide three elements to aid in this fight: 1) a new framework that collects health-related articles from verification entities and facilitates their check-worthiness and fact-checking annotation at the sentence level; 2) a corpus generated using this framework, composed of 10335 sentences annotated in these two concepts and grouped into 327 articles, which we call KEANE (faKe nEws At seNtence lEvel); and 3) a new model for verifying fake news that combines specific identifiers of the medical domain with triplets subject-predicate-object, using Transformers and feedforward neural networks at the sentence level. This model predicts the fact-checking of sentences and evaluates the veracity of the entire article. After training this model on our corpus, we achieved remarkable results in the binary classification of sentences (check-worthiness F1: 0.749, fact-checking F1: 0.698) and in the final classification of complete articles (F1: 0.703). We also tested its performance against another public dataset and found that it performed better than most systems evaluated on that dataset. Moreover, the corpus we provide differs from other existing corpora in its duality of sentence-article annotation, which can provide an additional level of justification of the prediction of truth or untruth made by the model.


Asunto(s)
Desinformación , Humanos , Redes Neurales de la Computación , Procesamiento de Lenguaje Natural , Decepción
2.
J Biomed Inform ; 138: 104279, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36610608

RESUMEN

BACKGROUND AND OBJECTIVES: Named Entity Recognition (NER) and Relation Extraction (RE) are two of the most studied tasks in biomedical Natural Language Processing (NLP). The detection of specific terms and entities and the relationships between them are key aspects for the development of more complex automatic systems in the biomedical field. In this work, we explore transfer learning techniques for incorporating information about negation into systems performing NER and RE. The main purpose of this research is to analyse to what extent the successful detection of negated entities in separate tasks helps in the detection of biomedical entities and their relationships. METHODS: Three neural architectures are proposed in this work, all of them mainly based on Bidirectional Long Short-Term Memory (Bi-LSTM) networks and Conditional Random Fields (CRFs). While the first architecture is devoted to detecting triggers and scopes of negated entities in any domain, two specific models are developed for performing isolated NER tasks and joint NER and RE tasks in the biomedical domain. Then, weights related to negation detection learned by the first architecture are incorporated into those last models. Two different languages, Spanish and English, are taken into account in the experiments. RESULTS: Performance of the biomedical models is analysed both when the weights of the neural networks are randomly initialized, and when weights from the negation detection model are incorporated into them. Improvements of around 3.5% of F-Measure in the English language and more than 7% in the Spanish language are achieved in the NER task, while the NER+RE task increases F-Measure scores by more than 13% for the NER submodel and around 2% for the RE submodel. CONCLUSIONS: The obtained results allow us to conclude that negation-based transfer learning techniques are appropriate for performing biomedical NER and RE tasks. These results highlight the importance of detecting negation for improving the identification of biomedical entities and their relationships. The explored techniques show robustness by maintaining consistent results and improvements across different tasks and languages.


Asunto(s)
Lenguaje , Redes Neurales de la Computación , Procesamiento de Lenguaje Natural , Aprendizaje Automático
3.
Sci Rep ; 12(1): 18208, 2022 10 28.
Artículo en Inglés | MEDLINE | ID: mdl-36307506

RESUMEN

Acquired immunodeficiency syndrome (AIDS) is still one of the main health problems worldwide. It is therefore essential to keep making progress in improving the prognosis and quality of life of affected patients. One way to advance along this pathway is to uncover connections between other disorders associated with HIV/AIDS-so that they can be anticipated and possibly mitigated. We propose to achieve this by using Association Rules (ARs). They allow us to represent the dependencies between a number of diseases and other specific diseases. However, classical techniques systematically generate every AR meeting some minimal conditions on data frequency, hence generating a vast amount of uninteresting ARs, which need to be filtered out. The lack of manually annotated ARs has favored unsupervised filtering, even though they produce limited results. In this paper, we propose a semi-supervised system, able to identify relevant ARs among HIV-related diseases with a minimal amount of annotated training data. Our system has been able to extract a good number of relationships between HIV-related diseases that have been previously detected in the literature but are scattered and are often little known. Furthermore, a number of plausible new relationships have shown up which deserve further investigation by qualified medical experts.


Asunto(s)
Síndrome de Inmunodeficiencia Adquirida , Infecciones por VIH , Humanos , Calidad de Vida , Aprendizaje Automático
4.
BMC Med Inform Decis Mak ; 22(1): 20, 2022 01 24.
Artículo en Inglés | MEDLINE | ID: mdl-35073885

RESUMEN

BACKGROUND: Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time-consuming. The purpose of this research is to design a new semi-supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data. METHODS: In this work we propose a new semi-supervised data mining model that combines unsupervised techniques (Fisher's exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F-measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps. RESULTS: The new semi-supervised ML algorithm improves the results of supervised algorithms computed using the F-measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data. CONCLUSIONS: Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.


Asunto(s)
Algoritmos , Aprendizaje Automático Supervisado , Minería de Datos/métodos , Humanos
5.
Artif Intell Med ; 121: 102177, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34763812

RESUMEN

BACKGROUND AND OBJECTIVES: The 10th version of International Classification of Diseases (ICD-10) codification system has been widely adopted by the health systems of many countries, including Spain. However, manual code assignment of Electronic Health Records (EHR) is a complex and time-consuming task that requires a great amount of specialised human resources. Therefore, several machine learning approaches are being proposed to assist in the assignment task. In this work we present an alternative system for automatically recommending ICD-10 codes to be assigned to EHRs. METHODS: Our proposal is based on characterising ICD-10 codes by a set of keyphrases that represent them. These keyphrases do not only include those that have literally appeared in some EHR with the considered ICD-10 codes assigned, but also others that have been obtained by a statistical process able to capture expressions that have led the annotators to assign the code. RESULTS: The result is an information model that allows to efficiently recommend codes to a new EHR based on their textual content. We explore an approach that proves to be competitive with other state-of-the-art approaches and can be combined with them to optimise results. CONCLUSIONS: In addition to its effectiveness, the recommendations of this method are easily interpretable since the phrases in an EHR leading to recommend an ICD-10 code are known. Moreover, the keyphrases associated with each ICD-10 code can be a valuable additional source of information for other approaches, such as machine learning techniques.


Asunto(s)
Registros Electrónicos de Salud , Clasificación Internacional de Enfermedades , Humanos , Aprendizaje Automático , Proyectos de Investigación , Recursos Humanos
6.
Comput Methods Programs Biomed ; 164: 121-129, 2018 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-30195420

RESUMEN

BACKGROUND AND OBJECTIVE: There is a huge amount of rare diseases, many of which have associated important disabilities. It is paramount to know in advance the evolution of the disease in order to limit and prevent the appearance of disabilities and to prepare the patient to manage the future difficulties. Rare disease associations are making an effort to manually collect this information, but it is a long process. A lot of information about the consequences of rare diseases is published in scientific papers, and our goal is to automatically extract disabilities associated with diseases from them. METHODS: This work presents a new corpus of abstracts from scientific papers related to rare diseases, which has been manually annotated with disabilities. This corpus allows to train machine and deep learning systems that can automatically process other papers, thus extracting new information about the relations between rare diseases and disabilities. The corpus is also annotated with negation and speculation when they appear affecting disabilities. The corpus has been made publicly accessible. RESULTS: We have devised some experiments using deep learning techniques to show the usefulness of the developed corpus. Specifically, we have designed a long short-term memory based architecture for disabilities identification, as well as a convolutional neural network for detecting their relationships to diseases. The systems designed do not need any preprocessing of the data, but only low dimensional vectors representing the words. CONCLUSIONS: The developed corpus will allow to train systems to identify disabilities in biomedical documents, which the current annotation systems are not able to detect. The system could also be trained to detect relationships between them and diseases, as well as negation and speculation, that can change the meaning of the language. The deep learning models designed for identifying disabilities and their relationships to diseases in new documents show that the corpus allows obtaining an F-measure of around 81% for the disability recognition and 75% for relation extraction.


Asunto(s)
Personas con Discapacidad/estadística & datos numéricos , Redes Neurales de la Computación , Enfermedades Raras/etiología , Minería de Datos , Bases de Datos Factuales/estadística & datos numéricos , Aprendizaje Profundo , Humanos , Procesamiento de Lenguaje Natural , Semántica
7.
Artif Intell Med ; 87: 9-19, 2018 05.
Artículo en Inglés | MEDLINE | ID: mdl-29573845

RESUMEN

Word sense disambiguation is a key step for many natural language processing tasks (e.g. summarization, text classification, relation extraction) and presents a challenge to any system that aims to process documents from the biomedical domain. In this paper, we present a new graph-based unsupervised technique to address this problem. The knowledge base used in this work is a graph built with co-occurrence information from medical concepts found in scientific abstracts, and hence adapted to the specific domain. Unlike other unsupervised approaches based on static graphs such as UMLS, in this work the knowledge base takes the context of the ambiguous terms into account. Abstracts downloaded from PubMed are used for building the graph and disambiguation is performed using the personalized PageRank algorithm. Evaluation is carried out over two test datasets widely explored in the literature. Different parameters of the system are also evaluated to test robustness and scalability. Results show that the system is able to outperform state-of-the-art knowledge-based systems, obtaining more than 10% of accuracy improvement in some cases, while only requiring minimal external resources.


Asunto(s)
Bases del Conocimiento , Procesamiento de Lenguaje Natural , Semántica , Algoritmos , Conjuntos de Datos como Asunto , PubMed , Unified Medical Language System
8.
J Biomed Inform ; 64: 320-332, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27815227

RESUMEN

Ambiguity in the biomedical domain represents a major issue when performing Natural Language Processing tasks over the huge amount of available information in the field. For this reason, Word Sense Disambiguation is critical for achieving accurate systems able to tackle complex tasks such as information extraction, summarization or document classification. In this work we explore whether multilinguality can help to solve the problem of ambiguity, and the conditions required for a system to improve the results obtained by monolingual approaches. Also, we analyze the best ways to generate those useful multilingual resources, and study different languages and sources of knowledge. The proposed system, based on co-occurrence graphs containing biomedical concepts and textual information, is evaluated on a test dataset frequently used in biomedicine. We can conclude that multilingual resources are able to provide a clear improvement of more than 7% compared to monolingual approaches, for graphs built from a small number of documents. Also, empirical results show that automatically translated resources are a useful source of information for this particular task.


Asunto(s)
Minería de Datos , Procesamiento de Lenguaje Natural , Algoritmos , Humanos , Bases del Conocimiento , Unified Medical Language System
9.
PLoS One ; 7(8): e43694, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22937081

RESUMEN

The size and complexity of actual networked systems hinders the access to a global knowledge of their structure. This fact pushes the problem of navigation to suboptimal solutions, one of them being the extraction of a coherent map of the topology on which navigation takes place. In this paper, we present a Markov chain based algorithm to tag networked terms according only to their topological features. The resulting tagging is used to compute similarity between terms, providing a map of the networked information. This map supports local-based navigation techniques driven by similarity. We compare the efficiency of the resulting paths according to their length compared to that of the shortest path. Additionally we claim that the path steps towards the destination are semantically coherent. To illustrate the algorithm performance we provide some results from the Simple English Wikipedia, which amounts to several thousand of pages. The simplest greedy strategy yields over an 80% of average success rate. Furthermore, the resulting content-coherent paths most often have a cost between one- and threefold compared to shortest-path lengths.


Asunto(s)
Algoritmos , Almacenamiento y Recuperación de la Información , Bases de Datos Factuales , Cadenas de Markov , Redes Neurales de la Computación
10.
Phys Rev E Stat Nonlin Soft Matter Phys ; 84(4 Pt 2): 046108, 2011 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-22181228

RESUMEN

The mesoscopic structure of complex networks has proven a powerful level of description to understand the linchpins of the system represented by the network. Nevertheless, the mapping of a series of relationships between elements, in terms of a graph, is sometimes not straightforward. Given that all the information we would extract using complex network tools depend on this initial graph, it is mandatory to preprocess the data to build it on in the most accurate manner. Here we propose a procedure to build a network, attending only to statistically significant relations between constituents. We use a paradigmatic example of word associations to show the development of our approach. Analyzing the modular structure of the obtained network we are able to disentangle categorical relations, disambiguating words with success that is comparable to the best algorithms designed to the same end.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA