Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 773
Filter
1.
BMC Bioinformatics ; 25(1): 273, 2024 Aug 21.
Article in English | MEDLINE | ID: mdl-39169321

ABSTRACT

BACKGROUND: There has been a considerable advancement in AI technologies like LLM and machine learning to support biomedical knowledge discovery. MAIN BODY: We propose a novel biomedical neural search service called 'VAIV Bio-Discovery', which supports enhanced knowledge discovery and document search on unstructured text such as PubMed. It mainly handles with information related to chemical compound/drugs, gene/proteins, diseases, and their interactions (chemical compounds/drugs-proteins/gene including drugs-targets, drug-drug, and drug-disease). To provide comprehensive knowledge, the system offers four search options: basic search, entity and interaction search, and natural language search. We employ T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the interaction extraction task by removing the self-attention layer in the decoder block. It also assists in interpreting research findings by summarizing the retrieved search results for a given natural language query with Retrieval Augmented Generation (RAG). The search engine is built with a hybrid method that combines neural search with the probabilistic search, BM25. CONCLUSION: As a result, our system can better understand the context, semantics and relationships between terms within the document, enhancing search accuracy. This research contributes to the rapidly evolving biomedical field by introducing a new service to access and discover relevant knowledge.


Subject(s)
Natural Language Processing , Data Mining/methods , Knowledge Discovery/methods , PubMed , Search Engine , Machine Learning , Information Storage and Retrieval/methods , Neural Networks, Computer
3.
Stud Health Technol Inform ; 316: 1622-1626, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176521

ABSTRACT

Personalized medicine enables precise tumor treatment for a patient's molecular genetic profile. To devise optimal targeted treatment plans for patients in a molecular tumor board, physicians must consider alterations on gene- and proteins levels but also cancer cell phenotypes. Machine learning can uncover buried patterns, extract pivotal information, and unveil corresponding insights from available data. Publicly available datasets provide the amounts of data necessary. This work outlines the efficacy of various machine learning algorithms which could eventually serve as clinical decision support in a precision oncology setting. Leveraging algorithms including Random Forest, Decision tree, XGBoost, Logistic regression, Gaussian Naive Bayes, k nearest neighbor, and AdaBoost, we conducted two experiments for the breast invasive carcinoma dataset. Incorporated data includes patient-, molecular- and treatment data. The aim of the investigation was to predict medication treatment or type of treatment based on genetic profile. After preprocessing and application of ML algorithms, the first results were promising. Multiple factors challenge application in clinical care settings without carefully considering the limitations.


Subject(s)
Data Mining , Decision Support Systems, Clinical , Machine Learning , Precision Medicine , Data Mining/methods , Humans , Breast Neoplasms/genetics , Breast Neoplasms/therapy , Breast Neoplasms/drug therapy , Algorithms , Female
4.
Stud Health Technol Inform ; 316: 1642-1646, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176525

ABSTRACT

This paper presents a comprehensive workflow for integrating revolving events into the transitive sequential pattern mining (tSPM+) algorithm and Machine Learning for Health Outcomes (MLHO) framework, emphasizing best practices and pitfalls in its application. We emphasize feature engineering and visualization techniques, demonstrating their efficacy in capturing temporal relationships. Applied to an EGFR lung cancer cohort, our approach showcases reliable temporal insights even in a small dataset. This work highlights the importance of temporal nuances in healthcare data analysis, paving the way for improved disease understanding and patient care.


Subject(s)
Algorithms , Data Mining , Lung Neoplasms , Machine Learning , Lung Neoplasms/therapy , Humans , Data Mining/methods , Workflow
5.
Stud Health Technol Inform ; 316: 1709-1713, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176539

ABSTRACT

The increasing volume of unstructured textual data in healthcare, particularly in nursing care reports, presents both challenges and opportunities for enhancing patient care and operational efficiency. This study explores the application of Latent Dirichlet Allocation (LDA) topic modeling to analyze free-text nursing narratives from inpatient stays in three different clinics, aiming to uncover the latent thematic structures within. Utilizing the R programming environment and the visualization tool LDAvis, we identified three main themes: "Patient Well-being," "Patient Mobility and Care Activities," and "Treatment and Pain Management," the latter combining two closely related but initially distinct topics due to their overlapping content. Our findings demonstrate the potential of LDA topic modeling in extracting meaningful insights from nursing narratives, which could inform patient care strategies and healthcare practices. However, the study also highlights significant challenges associated with the method, including the sensitivity to parameter settings, the lack of updates for key software packages, and concerns about reproducibility. These issues highlight the need for meticulous parameter validation and the exploration of alternative text analysis methodologies for future research. By addressing these methodological challenges and emphasizing the importance of comparative method analysis, this study contributes to the advancement of text analytics in healthcare. It opens avenues for further research aimed at developing more robust, efficient, and accessible tools for analyzing free-text data, thereby enhancing the ability of healthcare professionals to use unstructured data to improve decision making and patient outcomes.


Subject(s)
Narration , Humans , Natural Language Processing , Nursing Records , Nursing Care , Data Mining/methods
6.
Stud Health Technol Inform ; 316: 1694-1698, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176536

ABSTRACT

In many healthcare facilities, the prescription of drugs is done only in a semi-structured manner, using free-text fields where various information is often mixed. Therefore, automatic processing, especially for secondary use such as research purposes, is often challenging. This paper compares various approaches that identify and classify the various parts of these free-text fields in German language, namely simple Levenshtein-based, rule-based and CRF (conditional random field)-based approaches. Our results show that a F1-score >90% can be achieved with both the rule-based and the CRF-based approach, with the CRF-based approach even reaching nearly 95%.


Subject(s)
Natural Language Processing , Germany , Humans , Drug Prescriptions , Data Mining/methods , Electronic Prescribing , Electronic Health Records
7.
Stud Health Technol Inform ; 316: 1775-1779, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176561

ABSTRACT

Hand-labelling clinical corpora can be costly and inflexible, requiring re-annotation every time new classes need to be extracted. PICO (Participant, Intervention, Comparator, Outcome) information extraction can expedite conducting systematic reviews to answer clinical questions. However, PICO frequently extends to other entities such as Study type and design, trial context, and timeframe, requiring manual re-annotation of existing corpora. In this paper, we adapt Snorkel's weak supervision methodology to extend clinical corpora to new entities without extensive hand labelling. Specifically, we enrich the EBM-PICO corpus with new entities through an example of "Study type and design" extraction. Using weak supervision, we obtain programmatic labels on 4,081 EBM-PICO documents, achieving an F1-score of 85.02% on the test set.


Subject(s)
Systematic Reviews as Topic , Natural Language Processing , Humans , Information Storage and Retrieval/methods , Data Mining/methods
8.
Stud Health Technol Inform ; 316: 1780-1784, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176562

ABSTRACT

Radiology reports contain crucial patient information, in addition to images, that can be automatically extracted for secondary uses such as clinical support and research for diagnosis. We tested several classifiers to classify 1,218 breast MRI reports in French from two Swiss clinical centers. Logistic regression performed better for both internal (accuracy > 0.95 and macro-F1 > 0.86) and external data (accuracy > 0.81 and macro-F1 > 0.41). Automating this task will facilitate efficient extraction of targeted clinical parameters and provide a good basis for future annotation processes through automatic pre-annotation.


Subject(s)
Breast Neoplasms , Magnetic Resonance Imaging , Humans , Female , Breast Neoplasms/diagnostic imaging , France , Radiology Information Systems , Electronic Health Records , Natural Language Processing , Switzerland , Data Mining
9.
Stud Health Technol Inform ; 316: 214-215, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176711

ABSTRACT

Automatic extraction of body-text within clinical PDF documents is necessary to enhance downstream NLP tasks but remains a challenge. This study presents an unsupervised algorithm designed to extract body-text leveraging large volume of data. Using DBSCAN clustering over aggregate pages, our method extracts and organize text blocks using their content and coordinates. Evaluation results demonstrate precision scores ranging from 0.82 to 0.98, recall scores from 0.62 to 0.94, and F1-scores from 0.71 to 0.96 across various medical specialty sources. Future work includes dynamic parameter adjustments for improved accuracy and using larger datasets.


Subject(s)
Natural Language Processing , Algorithms , Data Mining/methods , Humans , Electronic Health Records , Unsupervised Machine Learning
10.
Stud Health Technol Inform ; 316: 374-375, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176755

ABSTRACT

There is a rapid growth in the volume of data in the cancer field and fine-grained classification is in high demand especially for interdisciplinary and collaborative research. There is thus a need to establish a multi-label classifier with higher resolution to reduce the burden of screening articles for clinical relevance. This research trains a multi-label classifier with scalability for classifying literature on cancer research directly at the publication level. Firstly, a corpus was divided into a training set and a testing set at a ratio of 7:3. Secondly, we compared the performance of classifiers developed by "PubMedBERT + TextRNN" and "BioBERT + TextRNN" with ICRP CT. Finally, the classifier was obtained based on the optimal combination "PubMedBERT + TextRNN", with P= 0.952014, R=0.936696, F1=0.931664. The quantitative comparisons demonstrate that the resulting classifier is fit for high-resolution classification of cancer literature at the publication level to support accurate retrieving and academic statistics.


Subject(s)
Neoplasms , Neoplasms/classification , Humans , PubMed , Data Mining/methods
11.
Stud Health Technol Inform ; 316: 1795-1799, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176839

ABSTRACT

Radiology reports are an essential communication method for ensuring smooth workflow in healthcare. However, many of these reports are described in free text, and findings documented by radiologists may not be adequately addressed. In this study, focusing on pulmonary nodules, we evaluated whether cases in which radiologists described follow-up as recommended were receiving appropriate treatment. Reports recommending follow-up for pulmonary nodules were automatically extracted using natural language processing. In our evaluation, out of 10,507 reports, 1,501 cases (14.3%) were classified as "reports recommending follow-up for pulmonary nodules." Among these, 958 cases underwent additional imaging tests within 400 days. From the remaining 543 cases, we randomly sampled 42 cases and conducted chart reviews by clinicians to confirm patient care status. Our assessment found that follow-up was not documented in 17 of the 42 cases (40.5%), indicating a high likelihood that appropriate care was not provided.


Subject(s)
Electronic Health Records , Natural Language Processing , Radiology Information Systems , Solitary Pulmonary Nodule , Humans , Solitary Pulmonary Nodule/diagnostic imaging , Lung Neoplasms/diagnostic imaging , Documentation , Data Mining/methods
12.
Stud Health Technol Inform ; 316: 1822-1826, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176845

ABSTRACT

We analyze five approaches to knowledge management in clinical decision support (CDS) systems: pattern recognition based on annotated imaging data, mining of stored structured medical data, text mining of published texts, computable knowledge design, and general or specific text corpora for large language models. Each method's strengths and limitations in automating clinical knowledge management while striving for a zero-error policy are evaluated, offering insights into their roles in enhancing healthcare through intelligent decision support. The study aims to inform decisions in the development of effective, transparent CDS systems in clinical and patient care settings.


Subject(s)
Data Mining , Decision Support Systems, Clinical , Knowledge Management , Data Mining/methods , Natural Language Processing , Humans , Electronic Health Records
13.
Stud Health Technol Inform ; 316: 1861-1865, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176854

ABSTRACT

Using clinical decision support systems (CDSSs) for breast cancer management necessitates to extract relevant patient data from textual reports which is a complex task although efficiently achieved by machine learning but black box methods. We proposed a rule-based natural language processing (NLP) method to automate the translation of breast cancer patient summaries into structured patient profiles suitable for input into the guideline-based CDSS of the DESIREE project. Our method encompasses named entity recognition (NER), relation extraction and structured data extraction to systematically organize patient data. The method demonstrated strong alignment with treatment recommendations generated for manually created patient profiles (gold standard) with only 2% of differences. Moreover, the NER pipeline achieved an average F1-score of 0.9 across the main entities (patient, side, and tumor), of 0,87 for relation extraction, and 0.75 for contextual information, showing promising results for rule-based NLP.


Subject(s)
Breast Neoplasms , Decision Support Systems, Clinical , Electronic Health Records , Natural Language Processing , Humans , Breast Neoplasms/therapy , Female , Data Mining/methods , Machine Learning
14.
Stud Health Technol Inform ; 316: 756-760, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176904

ABSTRACT

This study evaluates the efficacy of a small large language model (sLLM) in extracting critical information from free-text pathology reports across multiple centers, addressing the challenges posed by the narrative and complex nature of these documents. Employing three variants of the Llama 2 model, with 7 billion, 13 billion, and 70 billion parameters, the research assesses model performance in both zero-shot and five-shot settings, offering insights into the impact of example-based learning. A specialized information extraction tool utilizing regular expressions for pattern identification serves as the benchmark for evaluating the models' accuracy. Conducted within a hospital's internal environment, the study emphasizes the clinical applicability of these findings. The results reveal significant variations in model performance, with the 70 billion parameter model achieving remarkable accuracy in the five-shot scenario, demonstrating the potential of sLLMs in enhancing the efficiency and accuracy of data extraction from pathology reports. The study highlights the importance of example-driven learning and the trade-offs between model size, accuracy, hallucination rates, and processing time. These findings contribute to the ongoing efforts to integrate advanced language models into clinical settings, potentially transforming patient care and biomedical research by mitigating the limitations of manual data extraction processes.


Subject(s)
Natural Language Processing , Humans , Data Mining/methods , Electronic Health Records , Information Storage and Retrieval/methods , Machine Learning
15.
Stud Health Technol Inform ; 316: 894-898, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176937

ABSTRACT

With the objective of extracting new knowledge about rare diseases from social media messages, we evaluated three models on a Named Entity Recognition (NER) task, consisting of extracting phenotypes and treatments from social media messages. We trained the three models on a dataset with social media messages about Developmental and Epileptic Encephalopathies and more common diseases. This preliminary study revealed that CamemBERT and CamemBERT-bio exhibit similar performance on social media testimonials, slightly outperforming DrBERT. It also highlighted that their performance was lower on this type of data than on structured health datasets. Limitations, including a narrow focus on NER performance and dataset-specific evaluation, call for further research to fully assess model capabilities on larger and more diverse datasets.


Subject(s)
Social Media , France , Humans , Natural Language Processing , Data Mining/methods , Rare Diseases
16.
Stud Health Technol Inform ; 316: 899-903, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176938

ABSTRACT

Open source, lightweight and offline generative large language models (LLMs) hold promise for clinical information extraction due to their suitability to operate in secured environments using commodity hardware without token cost. By creating a simple lupus nephritis (LN) renal histopathology annotation schema and generating gold standard data, this study investigates prompt-based strategies using three state-of-the-art lightweight LLMs, namely BioMistral-DARE-7B (BioMistral), Llama-2-13B (Llama 2), and Mistral-7B-instruct-v0.2 (Mistral). We examine the performance of these LLMs within a zero-shot learning environment for renal histopathology report information extraction. Incorporating four prompting strategies, including combinations of batch prompt (BP), single task prompt (SP), chain of thought (CoT) and standard simple prompt (SSP), our findings indicate that both Mistral and BioMistral consistently demonstrated higher performance compared to Llama 2. Mistral recorded the highest performance, achieving an F1-score of 0.996 [95% CI: 0.993, 0.999] for extracting the numbers of various subtypes of glomeruli across all BP settings and 0.898 [95% CI: 0.871, 0.921] in extracting relational values of immune markers under the BP+SSP setting. This study underscores the capability of offline LLMs to provide accurate and secure clinical information extraction, which can serve as a promising alternative to their heavy-weight online counterparts.


Subject(s)
Lupus Nephritis , Natural Language Processing , Lupus Nephritis/pathology , Humans , Electronic Health Records , Data Mining/methods , Information Storage and Retrieval/methods
17.
Stud Health Technol Inform ; 316: 909-913, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176940

ABSTRACT

Electronic Health Records (EHRs) contain a wealth of unstructured patient data, making it challenging for physicians to do informed decisions. In this paper, we introduce a Natural Language Processing (NLP) approach for the extraction of therapies, diagnosis, and symptoms from ambulatory EHRs of patients with chronic Lupus disease. We aim to demonstrate the effort of a comprehensive pipeline where a rule-based system is combined with text segmentation, transformer-based topic analysis and clinical ontology, in order to enhance text preprocessing and automate rules' identification. Our approach is applied on a sub-cohort of 56 patients, with a total of 750 EHRs written in Italian language, achieving an Accuracy and an F-score over 97% and 90% respectively, in the three extracted domains. This work has the potential to be integrated with EHR systems to automate information extraction, minimizing the human intervention, and providing personalized digital solutions in the chronic Lupus disease domain.


Subject(s)
Electronic Health Records , Lupus Erythematosus, Systemic , Natural Language Processing , Humans , Chronic Disease , Data Mining/methods
18.
Stud Health Technol Inform ; 316: 949-950, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176948

ABSTRACT

In the field of medical data analysis, converting unstructured text documents into a structured format suitable for further use is a significant challenge. This study introduces an automated local deployed data privacy secure pipeline that uses open-source Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) architecture to convert medical German language documents with sensitive health-related information into a structured format. Testing on a proprietary dataset of 800 unstructured original medical reports demonstrated an accuracy of up to 90% in data extraction of the pipeline compared to data extracted manually by physicians and medical students. This highlights the pipeline's potential as a valuable tool for efficiently extracting relevant data from unstructured sources.


Subject(s)
Electronic Health Records , Natural Language Processing , Germany , Information Storage and Retrieval/methods , Humans , Computer Security , Data Mining/methods
19.
Stud Health Technol Inform ; 316: 944-948, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176947

ABSTRACT

In the ever-evolving landscape of medical research and healthcare, the abundance of scientific articles presents both a treasure trove of knowledge and a daunting challenge. Researchers, clinicians, and data scientists grapple with vast amounts of unstructured information, seeking to extract meaningful insights that can drive advancements in the biomedical domain including, research trends, patient care, drug discovery, and disease understanding. This paper utilizes the topic extraction algorithms on Breast Cancer Research to shed light on the current trends and the path to follow in this field. We utilized TextRank and Large Language Models (LLM) using the TripleA tool to extract topics in the field, analyzing and comparing the results.


Subject(s)
Breast Neoplasms , Natural Language Processing , Humans , Data Mining/methods , Female , Biomedical Research , Algorithms , Periodicals as Topic
20.
Stud Health Technol Inform ; 316: 983-987, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176956

ABSTRACT

Modern generative artificial intelligence techniques like retrieval-augmented generation (RAG) may be applied in support of precision oncology treatment discussions. Experts routinely review published literature for evidence and recommendations of treatments in a labor-intensive process. A RAG pipeline may help reduce this effort by providing chunks of text from these publications to an off-the-shelf large language model (LLM), allowing it to answer related questions without any fine-tuning. This potential application is demonstrated by retrieving treatment relationships from a trusted data source (OncoKB) and reproducing over 80% of them by asking simple questions to an untrained Llama 2 model with access to relevant abstracts.


Subject(s)
Medical Oncology , Natural Language Processing , Precision Medicine , Humans , Artificial Intelligence , Neoplasms/therapy , Information Storage and Retrieval/methods , Data Mining/methods
SELECTION OF CITATIONS
SEARCH DETAIL