Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Bioinformatics ; 40(Suppl 2): ii198-ii207, 2024 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-39230698

RESUMEN

MOTIVATION: In the realm of precision medicine, effective patient stratification and disease subtyping demand innovative methodologies tailored for multi-omics data. Clustering techniques applied to multi-omics data have become instrumental in identifying distinct subgroups of patients, enabling a finer-grained understanding of disease variability. Meanwhile, clinical datasets are often small and must be aggregated from multiple hospitals. Online data sharing, however, is seen as a significant challenge due to privacy concerns, potentially impeding big data's role in medical advancements using machine learning. This work establishes a powerful framework for advancing precision medicine through unsupervised random forest-based clustering in combination with federated computing. RESULTS: We introduce a novel multi-omics clustering approach utilizing unsupervised random forests. The unsupervised nature of the random forest enables the determination of cluster-specific feature importance, unraveling key molecular contributors to distinct patient groups. Our methodology is designed for federated execution, a crucial aspect in the medical domain where privacy concerns are paramount. We have validated our approach on machine learning benchmark datasets as well as on cancer data from The Cancer Genome Atlas. Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability. Experiments indicate that local clustering performance can be improved through federated computing. AVAILABILITY AND IMPLEMENTATION: The proposed methods are available as an R-package (https://github.com/pievos101/uRF).


Asunto(s)
Medicina de Precisión , Humanos , Análisis por Conglomerados , Medicina de Precisión/métodos , Aprendizaje Automático no Supervisado , Aprendizaje Automático , Neoplasias , Privacidad , Algoritmos , Bosques Aleatorios
2.
J Med Internet Res ; 26: e57852, 2024 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-39325515

RESUMEN

BACKGROUND: Clinical narratives are essential components of electronic health records. The adoption of electronic health records has increased documentation time for hospital staff, leading to the use of abbreviations and acronyms more frequently. This brevity can potentially hinder comprehension for both professionals and patients. OBJECTIVE: This review aims to provide an overview of the types of short forms found in clinical narratives, as well as the natural language processing (NLP) techniques used for their identification, expansion, and disambiguation. METHODS: In the databases Web of Science, Embase, MEDLINE, EBMR (Evidence-Based Medicine Reviews), and ACL Anthology, publications that met the inclusion criteria were searched according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for a systematic scoping review. Original, peer-reviewed publications focusing on short-form processing in human clinical narratives were included, covering the period from January 2018 to February 2023. Short-form types were extracted, and multidimensional research methodologies were assigned to each target objective (identification, expansion, and disambiguation). NLP study recommendations and study characteristics were systematically assigned occurrence rates for evaluation. RESULTS: Out of a total of 6639 records, only 19 articles were included in the final analysis. Rule-based approaches were predominantly used for identifying short forms, while string similarity and vector representations were applied for expansion. Embeddings and deep learning approaches were used for disambiguation. CONCLUSIONS: The scope and types of what constitutes a clinical short form were often not explicitly defined by the authors. This lack of definition poses challenges for reproducibility and for determining whether specific methodologies are suitable for different types of short forms. Analysis of a subset of NLP recommendations for assessing quality and reproducibility revealed only partial adherence to these recommendations. Single-character abbreviations were underrepresented in studies on clinical narrative processing, as were investigations in languages other than English. Future research should focus on these 2 areas, and each paper should include descriptions of the types of content analyzed.


Asunto(s)
Registros Electrónicos de Salud , Narración , Procesamiento de Lenguaje Natural , Humanos
3.
Stud Health Technol Inform ; 316: 1463-1464, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176479

RESUMEN

This paper presents a versatile solution to formally represent the contents of electronic health records. It is based on the knowledge graph paradigm, and semantic web standards RDF and OWL. It employs the established semantic standards SNOMED CT and FHIR, which warrant international interoperability. A graph-based form is not only useful to feed different target visualizations, but it can also be subject to AI-powered services such as (fuzzy) retrieval and summarization.


Asunto(s)
Registros Electrónicos de Salud , Humanos , Web Semántica , Systematized Nomenclature of Medicine , Medicina de Precisión , Semántica , Gráficos por Computador , Almacenamiento y Recuperación de la Información
4.
Stud Health Technol Inform ; 316: 1750-1751, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176553

RESUMEN

We present a data visualization tool for tumor boards, merging clinical with molecular and multi-omics data to refine precision oncology decisions. The tool offers a holistic patient perspective, facilitating personalized treatment strategies. By integrating clinical and laboratory datasets, it enables intuitive navigation through complex information. Clinicians are supported in their decision-making by user-friendly visualizations. Future studies are needed to evaluate its real-world impact and usability in precision oncology settings.


Asunto(s)
Neoplasias , Medicina de Precisión , Humanos , Interfaz Usuario-Computador , Oncología Médica , Visualización de Datos , Sistemas de Apoyo a Decisiones Clínicas
5.
Stud Health Technol Inform ; 316: 695-699, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176890

RESUMEN

Annotated language resources derived from clinical routine documentation form an intriguing asset for secondary use case scenarios. In this investigation, we report on how such a resource can be leveraged to identify additional term candidates for a chosen set of ICD-10 codes. We conducted a log-likelihood analysis, considering the co-occurrence of approximately 1.9 million de-identified ICD-10 codes alongside corresponding brief textual entries from problem lists in German. This analysis aimed to identify potential candidates with statistical significance set at p < 0.01, which were used as seed terms to harvest additional candidates by interfacing to a large language model in a second step. The proposed approach can identify additional term candidates at suitable performance values: hypernyms MAP@5=0.801, synonyms MAP@5 = 0.723 and hyponyms MAP@5 = 0.507. The re-use of existing annotated clinical datasets, in combination with large language models, presents an interesting strategy to bridge the lexical gap in standardized clinical terminologies and real-world jargon.


Asunto(s)
Clasificación Internacional de Enfermedades , Procesamiento de Lenguaje Natural , Vocabulario Controlado , Humanos , Terminología como Asunto , Registros Electrónicos de Salud/clasificación , Alemania
6.
J Am Med Inform Assoc ; 31(9): 2040-2046, 2024 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-38917444

RESUMEN

OBJECTIVE: To assess the performance of large language models (LLMs) for zero-shot disambiguation of acronyms in clinical narratives. MATERIALS AND METHODS: Clinical narratives in English, German, and Portuguese were applied for testing the performance of four LLMs: GPT-3.5, GPT-4, Llama-2-7b-chat, and Llama-2-70b-chat. For English, the anonymized Clinical Abbreviation Sense Inventory (CASI, University of Minnesota) was used. For German and Portuguese, at least 500 text spans were processed. The output of LLM models, prompted with contextual information, was analyzed to compare their acronym disambiguation capability, grouped by document-level metadata, the source language, and the LLM. RESULTS: On CASI, GPT-3.5 achieved 0.91 in accuracy. GPT-4 outperformed GPT-3.5 across all datasets, reaching 0.98 in accuracy for CASI, 0.86 and 0.65 for two German datasets, and 0.88 for Portuguese. Llama models only reached 0.73 for CASI and failed severely for German and Portuguese. Across LLMs, performance decreased from English to German and Portuguese processing languages. There was no evidence that additional document-level metadata had a significant effect. CONCLUSION: For English clinical narratives, acronym resolution by GPT-4 can be recommended to improve readability of clinical text by patients and professionals. For German and Portuguese, better models are needed. Llama models, which are particularly interesting for processing sensitive content on premise, cannot yet be recommended for acronym resolution.


Asunto(s)
Abreviaturas como Asunto , Procesamiento de Lenguaje Natural , Humanos , Lenguaje , Narración , Registros Electrónicos de Salud
7.
Bipolar Disord ; 26(4): 364-375, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38531635

RESUMEN

INTRODUCTION: Owing to the heterogenic picture of bipolar disorder, it takes approximately 8.8 years to reach a correct diagnosis. Early recognition and early intervention might not only increase quality of life, but also increase life expectancy as a whole in individuals with bipolar disorder. Therefore, we hypothesize that implementing machine learning techniques can be used to support the diagnostic process of bipolar disorder and minimize misdiagnosis rates. MATERIALS AND METHODS: To test this hypothesis, a de-identified data set of only demographic information and the results of cognitive tests of 196 patients with bipolar disorder and 145 healthy controls was used to train and compare five different machine learning algorithms. RESULTS: The best performing algorithm was logistic regression, with a macro-average F1-score of 0.69 [95% CI 0.66-0.73]. After further optimization, a model with an improved macro-average F1-score of 0.75, a micro-average F1-score of 0.77, and an AUROC of 0.84 was built. Furthermore, the individual amount of contribution per variable on the classification was assessed, which revealed that body mass index, results of the Stroop test, and the d2-R test alone allow for a classification of bipolar disorder with equal performance. CONCLUSION: Using these data for clinical application results in an acceptable performance, but has not yet reached a state where it can sufficiently augment a diagnosis made by an experienced clinician. Therefore, further research should focus on identifying variables with the highest amount of contribution to a model's classification.


Asunto(s)
Trastorno Bipolar , Aprendizaje Automático , Humanos , Trastorno Bipolar/diagnóstico , Femenino , Masculino , Adulto , Proyectos Piloto , Persona de Mediana Edad , Pruebas Neuropsicológicas
8.
BMC Med Inform Decis Mak ; 24(1): 29, 2024 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-38297364

RESUMEN

BACKGROUND: Oxygen saturation, a key indicator of COVID-19 severity, poses challenges, especially in cases of silent hypoxemia. Electronic health records (EHRs) often contain supplemental oxygen information within clinical narratives. Streamlining patient identification based on oxygen levels is crucial for COVID-19 research, underscoring the need for automated classifiers in discharge summaries to ease the manual review burden on physicians. METHOD: We analysed text lines extracted from anonymised COVID-19 patient discharge summaries in German to perform a binary classification task, differentiating patients who received oxygen supplementation and those who did not. Various machine learning (ML) algorithms, including classical ML to deep learning (DL) models, were compared. Classifier decisions were explained using Local Interpretable Model-agnostic Explanations (LIME), which visualize the model decisions. RESULT: Classical ML to DL models achieved comparable performance in classification, with an F-measure varying between 0.942 and 0.955, whereas the classical ML approaches were faster. Visualisation of embedding representation of input data reveals notable variations in the encoding patterns between classic and DL encoders. Furthermore, LIME explanations provide insights into the most relevant features at token level that contribute to these observed differences. CONCLUSION: Despite a general tendency towards deep learning, these use cases show that classical approaches yield comparable results at lower computational cost. Model prediction explanations using LIME in textual and visual layouts provided a qualitative explanation for the model performance.


Asunto(s)
COVID-19 , Compuestos de Calcio , Óxidos , Humanos , Estudios Retrospectivos , Oxígeno , Suplementos Dietéticos
9.
Stud Health Technol Inform ; 309: 78-82, 2023 Oct 20.
Artículo en Inglés | MEDLINE | ID: mdl-37869810

RESUMEN

Clinical texts are written with acronyms, abbreviations and medical jargon expressions to save time. This hinders full comprehension not just for medical experts but also laypeople. This paper attempts to disambiguate acronyms with their given context by comparing a web mining approach via the search engine BING and a conversational agent approach using ChatGPT with the aim to see, if these methods can supply a viable resolution for the input acronym. Both approaches are automated via application programming interfaces. Possible term candidates are extracted using natural language processing-oriented functionality. The conversational agent approach surpasses the baseline for web mining without plausibility thresholds in precision, recall and F1-measure, while scoring similarly only in precision for high threshold values.


Asunto(s)
Procesamiento de Lenguaje Natural , Programas Informáticos , Motor de Búsqueda , Comunicación , Escritura
10.
J Biomed Inform ; 147: 104497, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37777164

RESUMEN

A log-likelihood based co-occurrence analysis of ∼1.9 million de-identified ICD-10 codes and related short textual problem list entries generated possible term candidates at a significance level of p<0.01. These top 10 term candidates, consisting of 1 to 5-grams, were used as seed terms for an embedding based nearest neighbor approach to fetch additional synonyms, hypernyms and hyponyms in the respective n-gram embedding spaces by leveraging two different language models. This was done to analyze the lexicality of the resulting term candidates and to compare the term classifications of both models. We found no difference in system performance during the processing of lexical and non-lexical content, i.e. abbreviations, acronyms, etc. Additionally, an application-oriented analysis of the SapBERT (Self-Alignment Pretraining for Biomedical Entity Representations) language model indicates suitable performance for the extraction of all term classifications such as synonyms, hypernyms, and hyponyms.


Asunto(s)
Lenguaje , Procesamiento de Lenguaje Natural , Funciones de Verosimilitud , Análisis por Conglomerados
11.
Stud Health Technol Inform ; 302: 788-792, 2023 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-37203496

RESUMEN

Clinical information systems have become large repositories for semi-structured and partly annotated electronic health record data, which have reached a critical mass that makes them interesting for supervised data-driven neural network approaches. We explored automated coding of 50 character long clinical problem list entries using the International Classification of Diseases (ICD-10) and evaluated three different types of network architectures on the top 100 ICD-10 three-digit codes. A fastText baseline reached a macro-averaged F1-score of 0.83, followed by a character-level LSTM with a macro-averaged F1-score of 0.84. The top performing approach used a downstreamed RoBERTa model with a custom language model, yielding a macro-averaged F1-score of 0.88. A neural network activation analysis together with an investigation of the false positives and false negatives unveiled inconsistent manual coding as a main limiting factor.


Asunto(s)
Lenguaje , Redes Neurales de la Computación , Clasificación Internacional de Enfermedades , Registros Electrónicos de Salud
12.
Stud Health Technol Inform ; 302: 825-826, 2023 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-37203507

RESUMEN

Word vector representations, known as embeddings, are commonly used for natural language processing. Particularly, contextualized representations have been very successful recently. In this work, we analyze the impact of contextualized and non-contextualized embeddings for medical concept normalization, mapping clinical terms via a k-NN approach to SNOMED CT. The non-contextualized concept mapping resulted in a much better performance (F1-score = 0.853) than the contextualized representation (F1-score = 0.322).


Asunto(s)
Procesamiento de Lenguaje Natural , Systematized Nomenclature of Medicine
13.
Stud Health Technol Inform ; 302: 827-828, 2023 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-37203508

RESUMEN

A semi-structured clinical problem list containing ∼1.9 million de-identified entries linked to ICD-10 codes was used to identify closely related real-world expressions. A log-likelihood based co-occurrence analysis generated seed-terms, which were integrated as part of a k-NN search, by leveraging SapBERT for the generation of an embedding representation.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Funciones de Verosimilitud
14.
Stud Health Technol Inform ; 302: 837-838, 2023 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-37203513

RESUMEN

A large clinical diagnosis list is explored with the goal to cluster syntactic variants. A string similarity heuristic is compared with a deep learning-based approach. Levenshtein distance (LD) applied to common words only (not tolerating deviations in acronyms and tokens with numerals), together with pair-wise substring expansions raised F1 to 13% above baseline (plain LD), with a maximum F1 of 0.71. In contrast, the model-based approach trained on a German medical language model did not perform better than the baseline, not exceeding an F1 value of 0.42.


Asunto(s)
Lenguaje , Procesamiento de Lenguaje Natural , Registros Electrónicos de Salud , Registros , Análisis por Conglomerados
15.
Front Med (Lausanne) ; 10: 1073313, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37007792

RESUMEN

This paper provides an overview of current linguistic and ontological challenges which have to be met in order to provide full support to the transformation of health ecosystems in order to meet precision medicine (5 PM) standards. It highlights both standardization and interoperability aspects regarding formal, controlled representations of clinical and research data, requirements for smart support to produce and encode content in a way that humans and machines can understand and process it. Starting from the current text-centered communication practices in healthcare and biomedical research, it addresses the state of the art in information extraction using natural language processing (NLP). An important aspect of the language-centered perspective of managing health data is the integration of heterogeneous data sources, employing different natural languages and different terminologies. This is where biomedical ontologies, in the sense of formal, interchangeable representations of types of domain entities come into play. The paper discusses the state of the art of biomedical ontologies, addresses their importance for standardization and interoperability and sheds light to current misconceptions and shortcomings. Finally, the paper points out next steps and possible synergies of both the field of NLP and the area of Applied Ontology and Semantic Web to foster data interoperability for 5 PM.

16.
Stud Health Technol Inform ; 295: 49-50, 2022 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-35773803

RESUMEN

Wearable sensors and mHealth apps collect fitness and health data outside of clinical settings. These data are essential for precision medicine. This paper addresses and analyzes the available tools for extracting health and fitness data from wearables and mHealth apps. We focus on the most common tools used for research, namely, the Open mHealth-Shimmer application and Fitrockr research platform.


Asunto(s)
Aplicaciones Móviles , Telemedicina , Ejercicio Físico
17.
Stud Health Technol Inform ; 270: 83-87, 2020 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-32570351

RESUMEN

Word embeddings have become the predominant representation scheme on a token-level for various clinical natural language processing (NLP) tasks. More recently, character-level neural language models, exploiting recurrent neural networks, have again received attention, because they achieved similar performance against various NLP benchmarks. We investigated to what extent character-based language models can be applied to the clinical domain and whether they are able to capture reasonable lexical semantics using this maximally fine-grained representation scheme. We trained a long short-term memory network on an excerpt from a table of de-identified 50-character long problem list entries in German, each of which assigned to an ICD-10 code. We modelled the task as a time series of one-hot encoded single character inputs. After the training phase we accessed the top 10 most similar character-induced word embeddings related to a clinical concept via a nearest neighbour search and evaluated the expected interconnected semantics. Results showed that traceable semantics were captured on a syntactic level above single characters, addressing the idiosyncratic nature of clinical language. The results support recent work on general language modelling that raised the question whether token-based representation schemes are still necessary for specific NLP tasks.


Asunto(s)
Lenguaje , Procesamiento de Lenguaje Natural , Análisis por Conglomerados , Redes Neurales de la Computación , Atención al Paciente , Semántica
18.
J Am Med Inform Assoc ; 26(11): 1247-1254, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31512729

RESUMEN

OBJECTIVE: Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset. We sought to evaluate shallow and deep learning text classifiers and the impact of pretrained embeddings in a small clinical dataset. MATERIALS AND METHODS: We participated in the 2018 National NLP Clinical Challenges (n2c2) Shared Task on cohort selection and received an annotated dataset with medical narratives of 202 patients for multilabel binary text classification. We set our baseline to a majority classifier, to which we compared a rule-based classifier and orthogonal machine learning strategies: support vector machines, logistic regression, and long short-term memory neural networks. We evaluated logistic regression and long short-term memory using both self-trained and pretrained BioWordVec word embeddings as input representation schemes. RESULTS: Rule-based classifier showed the highest overall micro F1 score (0.9100), with which we finished first in the challenge. Shallow machine learning strategies showed lower overall micro F1 scores, but still higher than deep learning strategies and the baseline. We could not show a difference in classification efficiency between self-trained and pretrained embeddings. DISCUSSION: Clinical context, negation, and value-based criteria hindered shallow machine learning approaches, while deep learning strategies could not capture the term diversity due to the small training dataset. CONCLUSION: Shallow methods for clinical phenotyping can still outperform deep learning methods in small imbalanced data, even when supported by pretrained embeddings.


Asunto(s)
Ensayos Clínicos como Asunto/métodos , Minería de Datos/métodos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Selección de Paciente , Clasificación , Aprendizaje Profundo , Humanos , Modelos Logísticos , Redes Neurales de la Computación
19.
BMC Med Res Methodol ; 19(1): 155, 2019 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-31319802

RESUMEN

BACKGROUND: The identification of sections in narrative content of Electronic Health Records (EHR) has demonstrated to improve the performance of clinical extraction tasks; however, there is not yet a shared understanding of the concept and its existing methods. The objective is to report the results of a systematic review concerning approaches aimed at identifying sections in narrative content of EHR, using both automatic or semi-automatic methods. METHODS: This review includes articles from the databases: SCOPUS, Web of Science and PubMed (from January 1994 to September 2018). The selection of studies was done using predefined eligibility criteria and applying the PRISMA recommendations. Search criteria were elaborated by using an iterative and collaborative keyword enrichment. RESULTS: Following the eligibility criteria, 39 studies were selected for analysis. The section identification approaches proposed by these studies vary greatly depending on the kind of narrative, the type of section, and the application. We observed that 57% of them proposed formal methods for identifying sections and 43% adapted a previously created method. Seventy-eight percent were intended for English texts and 41% for discharge summaries. Studies that are able to identify explicit (with headings) and implicit sections correspond to 46%. Regarding the level of granularity, 54% of the studies are able to identify sections, but not subsections. From the technical point of view, the methods can be classified into rule-based methods (59%), machine learning methods (22%) and a combination of both (19%). Hybrid methods showed better results than those relying on pure machine learning approaches, but lower than rule-based methods; however, their scope was more ambitious than the latter ones. Despite all the promising performance results, very few studies reported tests under a formal setup. Almost all the studies relied on custom dictionaries; however, they used them in conjunction with a controlled terminology, most commonly the UMLSⓇ metathesaurus. CONCLUSIONS: Identification of sections in EHR narratives is gaining popularity for improving clinical extraction projects. This study enabled the community working on clinical NLP to gain a formal analysis of this task, including the most successful ways to perform it.


Asunto(s)
Registros Electrónicos de Salud , Narración , Diccionarios como Asunto , Humanos , Aprendizaje Automático , Terminología como Asunto
20.
Stud Health Technol Inform ; 258: 184-188, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30942742

RESUMEN

Clinical information systems contain free-text entries in different contexts to be used in a variety of application scenarios. In this study we investigate to what extent diagnosis codes using the disease classification system ICD-10 can be automatically post-assigned to patient-based short problem list entries, (50 characters maximum). Classifiers using random forest and Adaboost performed best with an F-measure of 0.87 and 0.85 running against an unbalanced data set, and an F-measure of 0.88 and 0.94 using a balanced data set, respectively.


Asunto(s)
Clasificación Internacional de Enfermedades , Automatización , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA