Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 6.041
Filtrar
1.
Sci Rep ; 13(1): 7815, 2023 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-37188766

RESUMO

Knowledge about the interactions between dietary and biomedical factors is scattered throughout uncountable research articles in an unstructured form (e.g., text, images, etc.) and requires automatic structuring so that it can be provided to medical professionals in a suitable format. Various biomedical knowledge graphs exist, however, they require further extension with relations between food and biomedical entities. In this study, we evaluate the performance of three state-of-the-art relation-mining pipelines (FooDis, FoodChem and ChemDis) which extract relations between food, chemical and disease entities from textual data. We perform two case studies, where relations were automatically extracted by the pipelines and validated by domain experts. The results show that the pipelines can extract relations with an average precision around 70%, making new discoveries available to domain experts with reduced human effort, since the domain experts should only evaluate the results, instead of finding, and reading all new scientific papers.


Assuntos
Mineração de Dados , Reconhecimento Automatizado de Padrão , Humanos , Mineração de Dados/métodos , Idioma , Processamento de Linguagem Natural
2.
J Med Internet Res ; 25: e44870, 2023 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-37133915

RESUMO

BACKGROUND: Medication noncompliance is a critical issue because of the increased number of drugs sold on the web. Web-based drug distribution is difficult to control, causing problems such as drug noncompliance and abuse. The existing medication compliance surveys lack completeness because it is impossible to cover patients who do not go to the hospital or provide accurate information to their doctors, so a social media-based approach is being explored to collect information about drug use. Social media data, which includes information on drug usage by users, can be used to detect drug abuse and medication compliance in patients. OBJECTIVE: This study aimed to assess how the structural similarity of drugs affects the efficiency of machine learning models for text classification of drug noncompliance. METHODS: This study analyzed 22,022 tweets about 20 different drugs. The tweets were labeled as either noncompliant use or mention, noncompliant sales, general use, or general mention. The study compares 2 methods for training machine learning models for text classification: single-sub-corpus transfer learning, in which a model is trained on tweets about a single drug and then tested on tweets about other drugs, and multi-sub-corpus incremental learning, in which models are trained on tweets about drugs in order of their structural similarity. The performance of a machine learning model trained on a single subcorpus (a data set of tweets about a specific category of drugs) was compared to the performance of a model trained on multiple subcorpora (data sets of tweets about multiple categories of drugs). RESULTS: The results showed that the performance of the model trained on a single subcorpus varied depending on the specific drug used for training. The Tanimoto similarity (a measure of the structural similarity between compounds) was weakly correlated with the classification results. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a subcorpus when the number of subcorpora was small. CONCLUSIONS: The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of the Tanimoto structural similarity if a sufficient variety of drugs are ensured.


Assuntos
Mídias Sociais , Transtornos Relacionados ao Uso de Substâncias , Humanos , Processamento de Linguagem Natural , Aprendizado de Máquina , Comércio
3.
Stud Health Technol Inform ; 301: 248-253, 2023 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-37172189

RESUMO

BACKGROUND: The aging population's need for treatment of chronic diseases is exhibiting a marked increase in urgency, with heart failure being one of the most severe diseases in this regard. To improve outpatient care of these patients and reduce hospitalization rates, the telemedical disease management program HerzMobil was developed in the past. OBJECTIVE: This work aims to analyze the inter-annotator variability among two professional groups (healthcare and engineering) involved in this program's annotation process of free-text clinical notes using categories. METHODS: A dataset of 1,300 text snippets was annotated by 13 annotators with different backgrounds. Inter-annotator variability and accuracy were evaluated using the F1-score and analyzed for differences between categories, annotators, and their professional backgrounds. RESULTS: The results show a significant difference between note categories concerning inter-annotator variability (p<0.0001) and accuracy (p<0.0001). However, there was no statistically significant difference between the two annotator groups, neither concerning inter-annotator variability (p=0.15) nor accuracy (p=0.84). CONCLUSION: Professional background had no significant impact on the annotation of free-text HerzMobil notes.


Assuntos
Registros Eletrônicos de Saúde , Insuficiência Cardíaca , Processamento de Linguagem Natural , Idoso , Humanos , Insuficiência Cardíaca/terapia , Hospitalização , Áustria
4.
PLoS One ; 18(5): e0283553, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37196047

RESUMO

OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. MATERIALS AND METHODS: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. RESULTS: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. DISCUSSION: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. CONCLUSION: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.


Assuntos
Doenças Diverticulares , Diverticulite , Divertículo , Humanos , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla/métodos , Processamento de Linguagem Natural , Fenótipo , Algoritmos , Polimorfismo de Nucleotídeo Único
5.
PLoS One ; 18(5): e0285630, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37200318

RESUMO

Natural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis and text summary to automatic language translation. In this work, we use NLP to figure out similar structural linguistic patterns among several different languages. We apply the word2vec algorithm that creates a vector representation for the words in a multidimensional space that maintains the meaning relationship between the words. From a large corpus we built this vectorial representation in a 100-dimensional space for English, Portuguese, German, Spanish, Russian, French, Chinese, Japanese, Korean, Italian, Arabic, Hebrew, Basque, Dutch, Swedish, Finnish, and Estonian. Then, we calculated the fractal dimensions of the structure that represents each language. The structures are multi-fractals with two different dimensions that we use, in addition to the token-dictionary size rate of the languages, to represent the languages in a three-dimensional space. Finally, analyzing the distance among languages in this space, we conclude that the closeness there is tendentially related to the distance in the Phylogenetic tree that depicts the lines of evolutionary descent of the languages from a common ancestor.


Assuntos
Inteligência Artificial , Fractais , Filogenia , Idioma , Tradução , Processamento de Linguagem Natural
6.
Stud Health Technol Inform ; 302: 743-744, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203482

RESUMO

In this communication, we demonstrate that the bias observed in domain general training sets with health-related content is not improved in domain specific health-communication corpora, contra.


Assuntos
Idioma , Processamento de Linguagem Natural , Viés
7.
Stud Health Technol Inform ; 302: 768-772, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203492

RESUMO

Previous work has successfully used machine learning and natural language processing for the phenotyping of Rheumatoid Arthritis (RA) patients in hospitals within the United States and France. Our goal is to evaluate the adaptability of RA phenotyping algorithms to a new hospital, both at the patient and encounter levels. Two algorithms are adapted and evaluated with a newly developed RA gold standard corpus, including annotations at the encounter level. The adapted algorithms offer comparably good performance for patient-level phenotyping on the new corpus (F1 0.68 to 0.82), but lower performance for encounter-level (F1 0.54). Regarding adaptation feasibility and cost, the first algorithm incurred a heavier adaptation burden because it required manual feature engineering. However, it is less computationally intensive than the second, semi-supervised, algorithm.


Assuntos
Artrite Reumatoide , Registros Eletrônicos de Saúde , Humanos , Algoritmos , Artrite Reumatoide/diagnóstico , Aprendizado de Máquina , Processamento de Linguagem Natural
8.
Stud Health Technol Inform ; 302: 803-807, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203499

RESUMO

Heart failure is a common chronic disease which is associated with high re-hospitalization and mortality rates. Within the telemedicine-assisted transitional care disease management program HerzMobil, monitoring data such as daily measured vital parameters and various other heart failure related data are collected in a structured way. Additionally, involved healthcare professionals communicate with one another via the system using free-text clinical notes. Since manual annotation of such notes is too time-consuming for routine care applications, an automated analysis process is needed. In the present study, we established a ground truth classification of 636 randomly selected clinical notes from HerzMobil based on annotations of 9 experts with different professional background (2 physicians, 4 nurses, and 3 engineers). We analyzed the influence of the professional background on the inter annotator reliability and compared the results with the accuracy of an automated classification algorithm. We found significant differences depending on the profession and on the category. These results indicate that different professional backgrounds should be considered when selecting annotators in such scenarios.


Assuntos
Insuficiência Cardíaca , Telemedicina , Humanos , Registros Eletrônicos de Saúde , Reprodutibilidade dos Testes , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/terapia , Algoritmos , Processamento de Linguagem Natural
9.
Stud Health Technol Inform ; 302: 815-816, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203502

RESUMO

Diagnosis classification in the emergency room (ER) is a complex task. We developed several natural language processing classification models, looking both at the full classification task of 132 diagnostic categories and at several clinically applicable samples consisting of two diagnoses that are hard to distinguish.


Assuntos
Serviço Hospitalar de Emergência , Processamento de Linguagem Natural
10.
Stud Health Technol Inform ; 302: 819-820, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203504

RESUMO

To classify sentences in cardiovascular German doctor's letters into eleven section categories, we used pattern-exploiting training, a prompt-based method for text classification in few-shot learning scenarios (20, 50 and 100 instances per class) using language models with various pre-training approaches evaluated on CARDIO:DE, a freely available German clinical routine corpus. Prompting improves results by 5-28% accuracy compared to traditional methods, reducing manual annotation efforts and computational costs in a clinical setting.


Assuntos
Idioma , Aprendizado de Máquina , Processamento de Linguagem Natural , Aprendizagem
11.
Stud Health Technol Inform ; 302: 835-836, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203512

RESUMO

The largest publicly funded project to generate a German-language medical text corpus will start in mid-2023. GeMTeX comprises clinical texts from information systems of six university hospitals, which will be made accessible for NLP by annotation of entities and relations, which will be enhanced with additional meta-information. A strong governance provides a stable legal framework for the use of the corpus. State-of-the art NLP methods are used to build, pre-annotate and annotate the corpus and train language models. A community will be built around GeMTeX to ensure its sustainable maintenance, use, and dissemination.


Assuntos
Idioma , Processamento de Linguagem Natural , Humanos
12.
Stud Health Technol Inform ; 302: 817-818, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203503

RESUMO

When patients with cancer develop depression, it is often left untreated. We developed a prediction model for depression risk within the first month after starting cancer treatment using machine learning and Natural Language Processing (NLP) models. The LASSO logistic regression model based on structured data performed well, whereas the NLP model based on only clinician notes did poorly. After further validation, prediction models for depression risk could lead to earlier identification and treatment of vulnerable patients, ultimately improving cancer care and treatment adherence.


Assuntos
Depressão , Neoplasias , Humanos , Depressão/diagnóstico , Pacientes , Aprendizado de Máquina , Medição de Risco , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Neoplasias/complicações
13.
Stud Health Technol Inform ; 302: 808-812, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203500

RESUMO

Many concepts in the medical literature are named after persons. Frequent ambiguities and spelling varieties, however, complicate the automatic recognition of such eponyms with natural language processing (NLP) tools. Recently developed methods include word vectors and transformer models that incorporate context information into the downstream layers of a neural network architecture. To evaluate these models for classifying medical eponymy, we label eponyms and counterexamples mentioned in a convenience sample of 1,079 Pubmed abstracts, and fit logistic regression models to the vectors from the first (vocabulary) and last (contextualized) layers of a SciBERT language model. According to the area under sensitivity-specificity curves, models based on contextualized vectors achieved a median performance of 98.0% in held-out phrases. This outperformed models based on vocabulary vectors (95.7%) by a median of 2.3 percentage points. When processing unlabeled inputs, such classifiers appeared to generalize to eponyms that did not appear among any annotations. These findings attest to the effectiveness of developing domain-specific NLP functions based on pre-trained language models, and underline the utility of context information for classifying potential eponyms.


Assuntos
Idioma , Redes Neurais de Computação , Processamento de Linguagem Natural , PubMed , Unified Medical Language System
14.
Stud Health Technol Inform ; 302: 825-826, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203507

RESUMO

Word vector representations, known as embeddings, are commonly used for natural language processing. Particularly, contextualized representations have been very successful recently. In this work, we analyze the impact of contextualized and non-contextualized embeddings for medical concept normalization, mapping clinical terms via a k-NN approach to SNOMED CT. The non-contextualized concept mapping resulted in a much better performance (F1-score = 0.853) than the contextualized representation (F1-score = 0.322).


Assuntos
Processamento de Linguagem Natural , Systematized Nomenclature of Medicine
15.
Stud Health Technol Inform ; 302: 831-832, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203510

RESUMO

Neural network language models, such as BERT, can be used for information extraction from medical texts with unstructured free text. These models can be pre-trained on a large corpus to learn the language and characteristics of the relevant domain and then fine-tuned with labeled data for a specific task. We propose a pipeline using human-in-the-loop labeling to create annotated data for Estonian healthcare information extraction. This method is particularly useful for low-resource languages and is more accessible to those in the medical field than rule-based methods like regular expressions.


Assuntos
Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Humanos , Redes Neurais de Computação , Idioma , Instalações de Saúde
16.
Stud Health Technol Inform ; 302: 829-830, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203509

RESUMO

Written text has been the preferred medium for storing health data ever since Hippocrates, and the medical narrative is what enables a humanized clinical relationship. Can't we admit natural language as a user-accepted technology that has stood against the test of time? We have previously presented a controlled natural language as a human-computer interface for semantic data capture already at the point of care. Our computable language was driven by a linguistic interpretation of the conceptual model of the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT). This paper presents an extension that allows the capture of measurement results with numerical values and units. We discuss the relation our method can have with emerging clinical information modelling.


Assuntos
Idioma , Processamento de Linguagem Natural , Humanos , Semântica , Systematized Nomenclature of Medicine , Unified Medical Language System
17.
Stud Health Technol Inform ; 302: 833-834, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203511

RESUMO

Retrieving health information is a task of search for health-related information from a variety of sources. Gathering self-reported health information may help enrich the knowledge body of the disease and its symptoms. We investigated retrieving symptom mentions in COVID-19-related Twitter posts with a pretrained large language model (GPT-3) without providing any examples (zero-shot learning). We introduced a new performance measure of total match (TM) to include exact, partial and semantic matches. Our results show that the zero-shot approach is a powerful method without the need to annotate any data, and it can assist in generating instances for few-shot learning which may achieve better performance.


Assuntos
COVID-19 , Mídias Sociais , Humanos , Idioma , Semântica , Processamento de Linguagem Natural
18.
Stud Health Technol Inform ; 302: 827-828, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203508

RESUMO

A semi-structured clinical problem list containing ∼1.9 million de-identified entries linked to ICD-10 codes was used to identify closely related real-world expressions. A log-likelihood based co-occurrence analysis generated seed-terms, which were integrated as part of a k-NN search, by leveraging SapBERT for the generation of an embedding representation.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Funções Verossimilhança
19.
Stud Health Technol Inform ; 302: 837-838, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203513

RESUMO

A large clinical diagnosis list is explored with the goal to cluster syntactic variants. A string similarity heuristic is compared with a deep learning-based approach. Levenshtein distance (LD) applied to common words only (not tolerating deviations in acronyms and tokens with numerals), together with pair-wise substring expansions raised F1 to 13% above baseline (plain LD), with a maximum F1 of 0.71. In contrast, the model-based approach trained on a German medical language model did not perform better than the baseline, not exceeding an F1 value of 0.42.


Assuntos
Idioma , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Registros , Análise por Conglomerados
20.
Stud Health Technol Inform ; 302: 1037-1041, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203576

RESUMO

In the context of medical concept extraction, it is critical to determine if clinical signs or symptoms mentioned in the text were present or absent, experienced by the patient or their relatives. Previous studies have focused on the NLP aspect but not on how to leverage this supplemental information for clinical applications. In this paper, we aim to use the patient similarity networks framework to aggregate different phenotyping modalities. NLP techniques were applied to extract phenotypes and predict their modalities from 5470 narrative reports of 148 patients with ciliopathies (a group of rare diseases). Patient similarities were computed using each modality separately for aggregation and clustering. We found that aggregating negated phenotypes improved patient similarity, but further aggregating relatives' phenotypes worsened the result. We suggest that different modalities of phenotypes can contribute to patient similarity, but they should be aggregated carefully and with appropriate similarity metrics and aggregation models.


Assuntos
Registros Eletrônicos de Saúde , Narração , Humanos , Fenótipo , Doenças Raras , Processamento de Linguagem Natural
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...