Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros










Intervalo de año de publicación
1.
Front Artif Intell ; 5: 970517, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36213168

RESUMEN

Resources for Natural Language Processing (NLP) are less numerous for languages different from English. In the clinical domain, where these resources are vital for obtaining new knowledge about human health and diseases, creating new resources for the Spanish language is imperative. One of the most common approaches in NLP is word embeddings, which are dense vector representations of a word, considering the word's context. This vector representation is usually the first step in various NLP tasks, such as text classification or information extraction. Therefore, in order to enrich Spanish language NLP tools, we built a Spanish clinical corpus from waiting list diagnostic suspicions, a biomedical corpus from medical journals, and term sequences sampled from the Unified Medical Language System (UMLS). These three corpora can be used to compute word embeddings models from scratch using Word2vec and fastText algorithms. Furthermore, to validate the quality of the calculated embeddings, we adapted several evaluation datasets in English, including some tests that have not been used in Spanish to the best of our knowledge. These translations were validated by two bilingual clinicians following an ad hoc validation standard for the translation. Even though contextualized word embeddings nowadays receive enormous attention, their calculation and deployment require specialized hardware and giant training corpora. Our static embeddings can be used in clinical applications with limited computational resources. The validation of the intrinsic test we present here can help groups working on static and contextualized word embeddings. We are releasing the training corpus and the embeddings within this publication.

2.
Rev Med Chil ; 149(7): 1014-1022, 2021 Jul.
Artículo en Español | MEDLINE | ID: mdl-34751303

RESUMEN

BACKGROUND: A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. AIM: To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. MATERIAL AND METHODS: A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training. RESULTS: An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. CONCLUSIONS: A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.


Asunto(s)
Procesamiento Automatizado de Datos , Chile , Humanos
3.
BMC Med Inform Decis Mak ; 21(1): 208, 2021 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-34210317

RESUMEN

BACKGROUND: In Chile, a patient needing a specialty consultation or surgery has to first be referred by a general practitioner, then placed on a waiting list. The Explicit Health Guarantees (GES in Spanish) ensures, by law, the maximum time to solve 85 health problems. Usually, a health professional manually verifies if each referral, written in natural language, corresponds or not to a GES-covered disease. An error in this classification is catastrophic for patients, as it puts them on a non-prioritized waiting list, characterized by prolonged waiting times. METHODS: To support the manual process, we developed and deployed a system that automatically classifies referrals as GES-covered or not using historical data. Our system is based on word embeddings specially trained for clinical text produced in Chile. We used a vector representation of the reason for referral and patient's age as features for training machine learning models using human-labeled historical data. We constructed a ground truth dataset combining classifications made by three healthcare experts, which was used to validate our results. RESULTS: The best performing model over ground truth reached an AUC score of 0.94, with a weighted F1-score of 0.85 (0.87 in precision and 0.86 in recall). During seven months of continuous and voluntary use, the system has amended 87 patient misclassifications. CONCLUSION: This system is a result of a collaboration between technical and clinical experts, and the design of the classifier was custom-tailored for a hospital's clinical workflow, which encouraged the voluntary use of the platform. Our solution can be easily expanded across other hospitals since the registry is uniform in Chile.


Asunto(s)
Medicina , Procesamiento de Lenguaje Natural , Chile , Hospitales Públicos , Humanos , Aprendizaje Automático
5.
Rev. méd. Chile ; 149(7): 1014-1022, jul. 2021. ilus, graf
Artículo en Español | LILACS | ID: biblio-1389546

RESUMEN

Background: A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. Aim: To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. Material and Methods: A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training. Results: An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. Conclusions: A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.


Asunto(s)
Humanos , Procesamiento Automatizado de Datos , Chile
6.
Stud Health Technol Inform ; 270: 347-351, 2020 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-32570404

RESUMEN

The amount of digital data derived from healthcare processes have increased tremendously in the last years. This applies especially to unstructured data, which are often hard to analyze due to the lack of available tools to process and extract information. Natural language processing is often used in medicine, but the majority of tools used by researchers are developed primarily for the English language. For developing and testing natural language processing methods, it is important to have a suitable corpus, specific to the medical domain that covers the intended target language. To improve the potential of natural language processing research, we developed tools to derive language specific medical corpora from publicly available text sources. n order to extract medicine-specific unstructured text data, openly available pub-lications from biomedical journals were used in a four-step process: (1) medical journal databases were scraped to download the articles, (2) the articles were parsed and consolidated into a single repository, (3) the content of the repository was de-scribed, and (4) the text data and the codes were released. In total, 93 969 articles were retrieved, with a word count of 83 868 501 in three different languages (German, English, and Spanish) from two medical journal databases Our results show that unstructured text data extraction from openly available medical journal databases for the construction of unified corpora of medical text data can be achieved through web scraping techniques.


Asunto(s)
Minería de Datos , Multilingüismo , Procesamiento de Lenguaje Natural , Unified Medical Language System
7.
Rev. méd. Chile ; 147(10): 1229-1238, oct. 2019. tab, graf
Artículo en Español | LILACS | ID: biblio-1058589

RESUMEN

Background: Free-text imposes a challenge in health data analysis since the lack of structure makes the extraction and integration of information difficult, particularly in the case of massive data. An appropriate machine-interpretation of electronic health records in Chile can unleash knowledge contained in large volumes of clinical texts, expanding clinical management and national research capabilities. Aim: To illustrate the use of a weighted frequency algorithm to find keywords. This finding was carried out in the diagnostic suspicion field of the Chilean specialty consultation waiting list, for diseases not covered by the Chilean Explicit Health Guarantees plan. Material and Methods: The waiting lists for a first specialty consultation for the period 2008-2018 were obtained from 17 out of 29 Chilean health services, and total of 2,592,925 diagnostic suspicions were identified. A natural language processing technique called Term Frequency-Inverse Document Frequency was used for the retrieval of diagnostic suspicion keywords. Results: For each specialty, four key words with the highest weighted frequency were determined. Word clouds showing words weighted by their importance were created to obtain a visual representation. These are available at cimt.uchile.cl/lechile/. Conclusions: The algorithm allowed to summarize unstructured clinical free-text data, improving its usefulness and accessibility.


Asunto(s)
Humanos , Procesamiento de Lenguaje Natural , Procesamiento Automatizado de Datos/métodos , Registros Médicos , Almacenamiento y Recuperación de la Información/métodos , Técnicas y Procedimientos Diagnósticos , Minería de Datos/métodos , Derivación y Consulta/estadística & datos numéricos , Factores de Tiempo , Computación en Informática Médica , Chile , Reproducibilidad de los Resultados , Medicina
8.
Rev Med Chil ; 147(10): 1229-1238, 2019 Oct.
Artículo en Español | MEDLINE | ID: mdl-32186630

RESUMEN

BACKGROUND: Free-text imposes a challenge in health data analysis since the lack of structure makes the extraction and integration of information difficult, particularly in the case of massive data. An appropriate machine-interpretation of electronic health records in Chile can unleash knowledge contained in large volumes of clinical texts, expanding clinical management and national research capabilities. AIM: To illustrate the use of a weighted frequency algorithm to find keywords. This finding was carried out in the diagnostic suspicion field of the Chilean specialty consultation waiting list, for diseases not covered by the Chilean Explicit Health Guarantees plan. MATERIAL AND METHODS: The waiting lists for a first specialty consultation for the period 2008-2018 were obtained from 17 out of 29 Chilean health services, and total of 2,592,925 diagnostic suspicions were identified. A natural language processing technique called Term Frequency-Inverse Document Frequency was used for the retrieval of diagnostic suspicion keywords. RESULTS: For each specialty, four key words with the highest weighted frequency were determined. Word clouds showing words weighted by their importance were created to obtain a visual representation. These are available at cimt.uchile.cl/lechile/. CONCLUSIONS: The algorithm allowed to summarize unstructured clinical free-text data, improving its usefulness and accessibility.


Asunto(s)
Minería de Datos/métodos , Técnicas y Procedimientos Diagnósticos , Procesamiento Automatizado de Datos/métodos , Almacenamiento y Recuperación de la Información/métodos , Registros Médicos , Procesamiento de Lenguaje Natural , Chile , Humanos , Computación en Informática Médica , Medicina , Derivación y Consulta/estadística & datos numéricos , Reproducibilidad de los Resultados , Factores de Tiempo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA