Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters











Database
Language
Publication year range
1.
Database (Oxford) ; 20242024 Sep 10.
Article in English | MEDLINE | ID: mdl-39259689

ABSTRACT

This paper presents a transformer-based approach for symptom Named Entity Recognition (NER) in Spanish clinical texts and multilingual entity linking on the SympTEMIST dataset. For Spanish NER, we fine tune a RoBERTa-based token-level classifier with Bidirectional Long Short-Term Memory and conditional random field layers on an augmented train set, achieving an F1 score of 0.73. Entity linking is performed via a hybrid approach with dictionaries, generating candidates from a knowledge base containing Unified Medical Language System aliases using the cross-lingual SapBERT and reranking the top candidates using GPT-3.5. The entity linking approach shows consistent results for multiple languages of 0.73 accuracy on the SympTEMIST multilingual dataset and also achieves an accuracy of 0.6123 on the Spanish entity linking task surpassing the current top score for this subtask. Database URL: https://github.com/svassileva/symptemist-multilingual-linking.


Subject(s)
Multilingualism , Humans , Natural Language Processing , Unified Medical Language System
2.
Med Image Anal ; 97: 103303, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39154617

ABSTRACT

The increasing availability of biomedical data creates valuable resources for developing new deep learning algorithms to support experts, especially in domains where collecting large volumes of annotated data is not trivial. Biomedical data include several modalities containing complementary information, such as medical images and reports: images are often large and encode low-level information, while reports include a summarized high-level description of the findings identified within data and often only concerning a small part of the image. However, only a few methods allow to effectively link the visual content of images with the textual content of reports, preventing medical specialists from properly benefitting from the recent opportunities offered by deep learning models. This paper introduces a multimodal architecture creating a robust biomedical data representation encoding fine-grained text representations within image embeddings. The architecture aims to tackle data scarcity (combining supervised and self-supervised learning) and to create multimodal biomedical ontologies. The architecture is trained on over 6,000 colon whole slide Images (WSI), paired with the corresponding report, collected from two digital pathology workflows. The evaluation of the multimodal architecture involves three tasks: WSI classification (on data from pathology workflow and from public repositories), multimodal data retrieval, and linking between textual and visual concepts. Noticeably, the latter two tasks are available by architectural design without further training, showing that the multimodal architecture that can be adopted as a backbone to solve peculiar tasks. The multimodal data representation outperforms the unimodal one on the classification of colon WSIs and allows to halve the data needed to reach accurate performance, reducing the computational power required and thus the carbon footprint. The combination of images and reports exploiting self-supervised algorithms allows to mine databases without needing new annotations provided by experts, extracting new information. In particular, the multimodal visual ontology, linking semantic concepts to images, may pave the way to advancements in medicine and biomedical analysis domains, not limited to histopathology.


Subject(s)
Deep Learning , Humans , Algorithms , Image Interpretation, Computer-Assisted/methods , Information Storage and Retrieval/methods , Image Processing, Computer-Assisted/methods
3.
J Pathol Inform ; 14: 100332, 2023.
Article in English | MEDLINE | ID: mdl-37705689

ABSTRACT

Computational pathology can significantly benefit from ontologies to standardize the employed nomenclature and help with knowledge extraction processes for high-quality annotated image datasets. The end goal is to reach a shared model for digital pathology to overcome data variability and integration problems. Indeed, data annotation in such a specific domain is still an unsolved challenge and datasets cannot be steadily reused in diverse contexts due to heterogeneity issues of the adopted labels, multilingualism, and different clinical practices. Material and methods: This paper presents the ExaMode ontology, modeling the histopathology process by considering 3 key cancer diseases (colon, cervical, and lung tumors) and celiac disease. The ExaMode ontology has been designed bottom-up in an iterative fashion with continuous feedback and validation from pathologists and clinicians. The ontology is organized into 5 semantic areas that defines an ontological template to model any disease of interest in histopathology. Results: The ExaMode ontology is currently being used as a common semantic layer in: (i) an entity linking tool for the automatic annotation of medical records; (ii) a web-based collaborative annotation tool for histopathology text reports; and (iii) a software platform for building holistic solutions integrating multimodal histopathology data. Discussion: The ontology ExaMode is a key means to store data in a graph database according to the RDF data model. The creation of an RDF dataset can help develop more accurate algorithms for image analysis, especially in the field of digital pathology. This approach allows for seamless data integration and a unified query access point, from which we can extract relevant clinical insights about the considered diseases using SPARQL queries.

4.
J Pathol Inform ; 13: 100139, 2022.
Article in English | MEDLINE | ID: mdl-36268087

ABSTRACT

Exa-scale volumes of medical data have been produced for decades. In most cases, the diagnosis is reported in free text, encoding medical knowledge that is still largely unexploited. In order to allow decoding medical knowledge included in reports, we propose an unsupervised knowledge extraction system combining a rule-based expert system with pre-trained Machine Learning (ML) models, namely the Semantic Knowledge Extractor Tool (SKET). Combining rule-based techniques and pre-trained ML models provides high accuracy results for knowledge extraction. This work demonstrates the viability of unsupervised Natural Language Processing (NLP) techniques to extract critical information from cancer reports, opening opportunities such as data mining for knowledge extraction purposes, precision medicine applications, structured report creation, and multimodal learning. SKET is a practical and unsupervised approach to extracting knowledge from pathology reports, which opens up unprecedented opportunities to exploit textual and multimodal medical information in clinical practice. We also propose SKET eXplained (SKET X), a web-based system providing visual explanations about the algorithmic decisions taken by SKET. SKET X is designed/developed to support pathologists and domain experts in understanding SKET predictions, possibly driving further improvements to the system.

5.
NPJ Digit Med ; 5(1): 102, 2022 Jul 22.
Article in English | MEDLINE | ID: mdl-35869179

ABSTRACT

The digitalization of clinical workflows and the increasing performance of deep learning algorithms are paving the way towards new methods for tackling cancer diagnosis. However, the availability of medical specialists to annotate digitized images and free-text diagnostic reports does not scale with the need for large datasets required to train robust computer-aided diagnosis methods that can target the high variability of clinical cases and data produced. This work proposes and evaluates an approach to eliminate the need for manual annotations to train computer-aided diagnosis tools in digital pathology. The approach includes two components, to automatically extract semantically meaningful concepts from diagnostic reports and use them as weak labels to train convolutional neural networks (CNNs) for histopathology diagnosis. The approach is trained (through 10-fold cross-validation) on 3'769 clinical images and reports, provided by two hospitals and tested on over 11'000 images from private and publicly available datasets. The CNN, trained with automatically generated labels, is compared with the same architecture trained with manual labels. Results show that combining text analysis and end-to-end deep neural networks allows building computer-aided diagnosis tools that reach solid performance (micro-accuracy = 0.908 at image-level) based only on existing clinical data without the need for manual annotations.

6.
Health Inf Sci Syst ; 5(1): 3, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29038733

ABSTRACT

BACKGROUND: Studying comorbidities of disorders is important for detection and prevention. For discovering frequent patterns of diseases we can use retrospective analysis of population data, by filtering events with common properties and similar significance. Most frequent pattern mining methods do not consider contextual information about extracted patterns. Further data mining developments might enable more efficient applications in specific tasks like comorbidities identification. METHODS: We propose a cascade data mining approach for frequent pattern mining enriched with context information, including a new algorithm MIxCO for maximal frequent patterns mining. Text mining tools extract entities from free text and deliver additional context attributes beyond the structured information about the patients. RESULTS: The proposed approach was tested using pseudonymised reimbursement requests (outpatient records) submitted to the Bulgarian National Health Insurance Fund in 2010-2016 for more than 5 million citizens yearly. Experiments were run on 3 data collections. Some known comorbidities of Schizophrenia, Hyperprolactinemia and Diabetes Mellitus Type 2 are confirmed; novel hypotheses about stable comorbidities are generated. The evaluation shows that MIxCO is efficient for big dense datasets. CONCLUSION: Explicating maximal frequent itemsets enables to build hypotheses concerning the relationships between the exogeneous and endogeneous factors triggering the formation of these sets. MixCO will help to identify risk groups of patients with a predisposition to develop socially-significant disorders like diabetes. This will turn static archives like the Diabetes Register in Bulgaria to a powerful alerting and predictive framework.

7.
Stud Health Technol Inform ; 169: 527-31, 2011.
Article in English | MEDLINE | ID: mdl-21893805

ABSTRACT

Information Extraction (IE) from medical texts aims at the automatic recognition of entities and relations of interests. IE is based on shallow analysis and considers only sentences containing important words. Thus IE of drugs from discharge letters can identify as 'current' some past or future medication events. This article presents heuristic observations enabling to filter drugs that are taken by the patients during the hospitalization. These heuristics are based on the default PR structure and linguistic expressions signaling temporal and conditional markers. They are integrated in a system for drug extraction from hospital Patient Records (PRs) in Bulgarian language. Present evaluation results are summarized as well.


Subject(s)
Data Mining/methods , Medical Informatics/methods , Medical Records Systems, Computerized/organization & administration , Algorithms , Clinical Pharmacy Information Systems , Drug-Related Side Effects and Adverse Reactions/prevention & control , Electronic Data Processing , Hospital Administration , Humans , Medication Errors/prevention & control , Natural Language Processing , Software
8.
Stud Health Technol Inform ; 166: 119-28, 2011.
Article in English | MEDLINE | ID: mdl-21685617

ABSTRACT

This paper presents methods for shallow Information Extraction (IE) from the free text zones of hospital Patient Records (PRs) in Bulgarian language in the Patient Safety through Intelligent Procedures in medication (PSIP) project. We extract automatically information about drug names, dosage, modes and frequency and assign the corresponding ATC code to each medication event. Using various modules for rule-based text analysis, our IE components in PSIP perform a significant amount of symbolic computations. We try to address negative statements, elliptical constructions, typical conjunctive phrases, and simple inferences concerning temporal constraints and finally aim at the assignment of the drug ACT code to the extracted medication events, which additionally complicates the extraction algorithm. The prototype of the system was used for experiments with a training corpus containing 1,300 PRs and the evaluation results are obtained using a test corpus containing 6,200 PRs. The extraction accuracy (f-score) for drug names is 98.42% and for dose 93.85%.


Subject(s)
Data Mining/methods , Hospital Administration , Medical Records Systems, Computerized/organization & administration , Drug-Related Side Effects and Adverse Reactions/prevention & control , Humans , Neural Networks, Computer
9.
Stud Health Technol Inform ; 166: 260-9, 2011.
Article in English | MEDLINE | ID: mdl-21685632

ABSTRACT

This paper presents experiments in automatic Information Extraction of medication events, diagnoses, and laboratory tests form hospital patient records, in order to increase the completeness of the description of the episode of care. Each patient record in our hospital information system contains structured data and text descriptions, including full discharge letters. From these letters, we extract automatically information about the medication just before and in the time of hospitalization, especially for the drugs prescribed to the patient, but not delivered by the hospital pharmacy; we also extract values of lab tests not performed and not registered in our laboratory as well as all non-encoded diagnoses described only in the free text of discharge letters. Thus we increase the availability of suitable and accurate information about the hospital stay and the outpatient segment of care before the hospitalization. Information Extraction also helps to understand the clinical and organizational decisions concerning the patient without increasing the complexity of the structured health record.


Subject(s)
Continuity of Patient Care/organization & administration , Data Mining/methods , Drug-Related Side Effects and Adverse Reactions/prevention & control , Medical Records Systems, Computerized/organization & administration , Semantics , Diagnostic Techniques and Procedures , Humans , Information Systems/organization & administration , Quality of Health Care/organization & administration , Software Validation
SELECTION OF CITATIONS
SEARCH DETAIL