Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 58
Filter
1.
Yearb Med Inform ; 32(1): 244-252, 2023 Aug.
Article in English | MEDLINE | ID: mdl-38147866

ABSTRACT

OBJECTIVES: To analyse the content of publications within the medical Natural Language Processing (NLP) domain in 2022. METHODS: Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. RESULTS: Three best papers have been selected. We also propose an analysis of the content of the NLP publications in 2022, stressing on some of the topics. CONCLUSION: The main trend in 2022 is certainly related to the availability of large language models, especially those based on Transformers, and to their use by non-NLP researchers. This leads to the democratization of the NLP methods. We also observe the renewal of interest to languages other than English, the continuation of research on information extraction and prediction, the massive use of data from social media, and the consideration of needs and interests of patients.


Subject(s)
Information Storage and Retrieval , Natural Language Processing , Humans
2.
Yearb Med Inform ; 31(1): 254-260, 2022 Aug.
Article in English | MEDLINE | ID: mdl-36463883

ABSTRACT

OBJECTIVES: Analyze the content of publications within the medical natural language processing (NLP) domain in 2021. METHODS: Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. RESULTS: Four best papers have been selected in 2021. We also propose an analysis of the content of the NLP publications in 2021, all topics included. CONCLUSIONS: The main issues addressed in 2021 are related to the investigation of COVID-related questions and to the further adaptation and use of transformer models. Besides, the trends from the past years continue, such as information extraction and use of information from social networks.


Subject(s)
COVID-19 , Natural Language Processing , Humans , Information Storage and Retrieval , Social Networking
3.
Stud Health Technol Inform ; 294: 634-638, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612166

ABSTRACT

The reduction of the linguistic complexity of medical texts to make them more understandable to a larger population is an important task. The simplification of texts involves several steps, among which our study focuses on the definition of complex constructions and on study of the impact of the simplification. For this study, we selected 20 texts from the medical domain on different topics, namely drugs, diseases, substances, and medical institutions. We identified complex linguistic constructions and carried out their manual simplification at syntactic, lexical and semantic levels. We then designed a questionnaire to test comprehension of the texts and conducted a study with 26 participants. The results of this study shows that simplified texts obtained higher number of correct answers than technical texts. This difference is statistically significant. The self-evaluation questionnaire, done at the beginning of the test, indicates that the participants tend to overestimate their understanding of medical information. Besides, there is no correlation between the time taken to complete the interview and the correct answers provided.


Subject(s)
Comprehension , Semantics , Humans , Linguistics , Surveys and Questionnaires
4.
Stud Health Technol Inform ; 294: 868-869, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612229

ABSTRACT

We address the problem of semantic labeling of terms in two French medical corpora with the subset of the UMLS. We perform two experiments relying on the structure of words and terms, and on their context: 1) the semantic label of already identified terms is predicted; 2) the terms are detected in raw texts and their semantic label is predicted. Our results show over 0.90 F-measure.


Subject(s)
Semantics , Unified Medical Language System , Natural Language Processing
5.
Yearb Med Inform ; 30(1): 257-263, 2021 Aug.
Article in English | MEDLINE | ID: mdl-34479397

ABSTRACT

OBJECTIVES: To analyze the content of publications within the medical NLP domain in 2020. METHODS: Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. RESULTS: Three best papers have been selected in 2020. We also propose an analysis of the content of the NLP publications in 2020, all topics included. CONCLUSION: The two main issues addressed in 2020 are related to the investigation of COVID-related questions and to the further adaptation and use of transformer models. Besides, the trends from the past years continue, such as diversification of languages processed and use of information from social networks.


Subject(s)
COVID-19 , Natural Language Processing , Social Networking , Clinical Trials as Topic , Humans , Medical Records , Mental Disorders
6.
Stud Health Technol Inform ; 281: 308-312, 2021 May 27.
Article in English | MEDLINE | ID: mdl-34042755

ABSTRACT

Easy access to medical and health information for children, foreigners and patients is an important issue for the modern society and research. Indeed, misunderstanding of medical and health information by patients may have a negative impact on their healthcare process and health. Even if several simplification guidelines exist, they are difficult to use by medical experts (i.e. lack of time, difficulty to respect the criteria). Existing simplification systems mainly address some lexical or syntactic transformations. We propose to combine lexical and syntactic simplifications within one rule-based system and to make the process fine-grained thanks to a better control of the grammaticality of sentences.


Subject(s)
Language , Child , Humans
7.
Stud Health Technol Inform ; 281: 313-317, 2021 May 27.
Article in English | MEDLINE | ID: mdl-34042756

ABSTRACT

Abbreviations are very frequent in medical and health documents but they convey opaque semantics. The association with their expanded forms, like Chronic obstructive pulmonary disease for COPD, may help their understanding. Yet, several abbreviations are ambiguous and have expanded forms possible. We propose to disambiguate the abbreviations in order to associate them with the proper expansion for a given context. We treat the problem through supervised categorization. We create reference data and test several algorithms. The descriptors are collected from lexical and syntactic contexts of abbreviations. The training is done on sentences containing expanded forms of abbreviations. The test is done on corpus built manually, in which the meaning of abbreviations is defined according to their contexts. Our approach shows up to 0.895 F-measure on training data and 0.773 on test data.


Subject(s)
Algorithms , Semantics , Language , Natural Language Processing
8.
J Biomed Semantics ; 11(1): 7, 2020 08 06.
Article in English | MEDLINE | ID: mdl-32762729

ABSTRACT

BACKGROUND: Textual corpora are extremely important for various NLP applications as they provide information necessary for creating, setting and testing those applications and the corresponding tools. They are also crucial for designing reliable methods and reproducible results. Yet, in some areas, such as the medical area, due to confidentiality or to ethical reasons, it is complicated or even impossible to access representative textual data. We propose the CAS corpus built with clinical cases, such as they are reported in the published scientific literature in French. RESULTS: Currently, the corpus contains 4,900 clinical cases in French, totaling nearly 1.7M word occurrences. Some clinical cases are associated with discussions. A subset of the whole set of cases is enriched with morpho-syntactic (PoS-tagging, lemmatization) and semantic (the UMLS concepts, negation, uncertainty) annotations. The corpus is being continuously enriched with new clinical cases and annotations. The CAS corpus has been compared with similar clinical narratives. When computed on tokenized and lowercase words, the Jaccard index indicates that the similarity between clinical cases and narratives reaches up to 0.9727. CONCLUSION: We assume that the CAS corpus can be effectively exploited for the development and testing of NLP tools and methods. Besides, the corpus will be used in NLP challenges and distributed to the research community.


Subject(s)
Natural Language Processing , Electronic Health Records , Information Storage and Retrieval , Semantics
9.
Yearb Med Inform ; 29(1): 221-225, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32823319

ABSTRACT

OBJECTIVES: Analyze papers published in 2019 within the medical natural language processing (NLP) domain in order to select the best works of the field. METHODS: We performed an automatic and manual pre-selection of papers to be reviewed and finally selected the best NLP papers of the year. We also propose an analysis of the content of NLP publications in 2019. RESULTS: Three best papers have been selected this year including the generation of synthetic record texts in Chinese, a method to identify contradictions in the literature, and the BioBERT word representation. CONCLUSIONS: The year 2019 was very rich and various NLP issues and topics were addressed by research teams. This shows the will and capacity of researchers to move towards robust and reproducible results. Researchers also prove to be creative in addressing original issues with relevant approaches.


Subject(s)
Data Mining/methods , Electronic Health Records , Natural Language Processing , Information Storage and Retrieval/methods
10.
Stud Health Technol Inform ; 270: 362-366, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570407

ABSTRACT

Parallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. We treat this task as binary classification (alignment/non-alignment). Our results show that the method we present here can be used to automatically generate a corpus of parallel sentences from our comparable corpus.


Subject(s)
Language , Natural Language Processing , Comprehension , Semantics , Unified Medical Language System
11.
Stud Health Technol Inform ; 270: 427-431, 2020 Jun 16.
Article in English | MEDLINE | ID: mdl-32570420

ABSTRACT

Automatic detection of ICD-10 codes in clinical documents has become a necessity. In this article, after a brief reminder of the existing work, we present a corpus of French clinical narratives annotated with the ICD-10 codes. Then, we propose automatic methods based on neural network approaches for the automatic detection of the ICD-10 codes. The results show that we need 1) more examples per class given the number of classes to assign, and 2) a better word/concept vector representation of documents in order to accurately assign codes.


Subject(s)
International Classification of Diseases , Supervised Machine Learning , Clinical Coding , Electronic Health Records , Narration , Neural Networks, Computer
12.
Stud Health Technol Inform ; 264: 30-34, 2019 Aug 21.
Article in English | MEDLINE | ID: mdl-31437879

ABSTRACT

Non-compliance situations happen when patients do not follow their prescriptions and take actions that lead to potentially harmful situations. Although such situations are dangerous, patients usually do not report them to their physicians. Hence, it is necessary to study other sources of information. We propose to study online health fora. The purpose of our work is to explore online health fora with supervised classification and information retrieval methods in order to identify messages that contain drug non-compliance. The supervised classification method permits detection of non-compliance with up to 0.824 F-measure, while the information retrieval method permits detection non-compliance with up to 0.529 F-measure. For some fine-grained categories and new data, it shows up to 0.65-0.70 Precision.


Subject(s)
Information Storage and Retrieval , Internet , Patient Compliance , Humans , Machine Learning , Physicians
13.
Stud Health Technol Inform ; 264: 1327-1331, 2019 Aug 21.
Article in English | MEDLINE | ID: mdl-31438141

ABSTRACT

Detection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. We propose to combine supervised machine learning algorithms using various features with word embeddings which contain context information of words. Data in French are manually cross-annotated by seven annotators. On the basis of these data, we propose cross-validation scenarios in order to test the generalization ability of models to detect the difficulty of medical words. On data provided by seven annotators, we show that the models are generalizable from one annotator to another.


Subject(s)
Algorithms , Comprehension , Language , Natural Language Processing , Supervised Machine Learning
14.
Yearb Med Inform ; 28(1): 218-222, 2019 Aug.
Article in English | MEDLINE | ID: mdl-31419835

ABSTRACT

OBJECTIVES: To analyze the content of publications within the medical Natural Language Processing (NLP) domain in 2018. METHODS: Automatic and manual pre-selection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. RESULTS: Two best papers have been selected this year. One dedicated to the generation of multi- documents summaries and another dedicated to the generation of imaging reports. We also proposed an analysis of the content of main research trends of NLP publications in 2018. CONCLUSIONS: The year 2018 is very rich with regard to NLP issues and topics addressed. It shows the will of researchers to go towards robust and reproducible results. Researchers also prove to be creative for original issues and approaches.


Subject(s)
Diagnostic Imaging , Natural Language Processing , Electronic Health Records , Humans , Image Processing, Computer-Assisted
15.
Health Informatics J ; 25(1): 17-26, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30871399

ABSTRACT

More and more health websites hire medical experts (physicians, medical students, experienced volunteers, etc.) and indicate explicitly their medical role in order to notify that they provide high-quality answers. However, medical experts may participate in forum discussions even when their role is not officially indicated. Detecting posts written by medical experts facilitates the quick access to posts that have more chances of being correct and informative. The main objective of this work is to learn classification models that can be used to detect posts written by medical experts in any health forum discussions. Two French health forums have been used to discover the best features and methods for this text categorization task. The obtained results confirm that models learned on appropriate websites may be used efficiently on other websites (more than 98% of F1-measure has been obtained using a Random Forest classifier). A study of misclassified posts highlights the participation of medical experts in forum discussions even if their role is not explicitly indicated.


Subject(s)
Clinical Competence/standards , Social Media/instrumentation , Clinical Competence/statistics & numerical data , France , Humans , Internet , Interpersonal Relations , Social Media/standards , Social Media/trends
16.
Front Pharmacol ; 9: 791, 2018.
Article in English | MEDLINE | ID: mdl-30140224

ABSTRACT

Drug misuse may happen when patients do not follow the prescriptions and do actions which lead to potentially harmful situations, such as intakes of incorrect dosage (overuse or underuse) or drug use for indications different from those prescribed. Although such situations are dangerous, patients usually do not report the misuse of drugs to their physicians. Hence, other sources of information are necessary for studying these issues. We assume that online health fora can provide such information and propose to exploit them. The general purpose of our work is the automatic detection and classification of drug misuses by analysing user-generated data in French social media. To this end, we propose a multi-step method, the main steps of which are: (1) indexing of messages with extended vocabulary adapted to social media writing; (2) creation of typology of drug misuses; and (3) automatic classification of messages according to whether they contain drug misuses or not. We present the results obtained at different steps and discuss them. The proposed method permit to detect the misuses with up to 0.773 F-measure.

17.
Stud Health Technol Inform ; 247: 351-355, 2018.
Article in English | MEDLINE | ID: mdl-29677981

ABSTRACT

Patients seldom report the misuse of drugs to their physicians. Hence, other sources of information are necessary for studying these issues. We assume that online health fora can provide such information and propose to exploit them for building a typology of drug misuses. The misuses detected are structured according to the goals of patients: we distinguished three types of non-intentional misuses and 14 types of intentional misuses. This work will be used to guide future task of automatic extraction of drug misuses.


Subject(s)
Drug Misuse , Physicians , Adverse Drug Reaction Reporting Systems , Data Mining , Humans
18.
J Eval Clin Pract ; 24(3): 536-544, 2018 06.
Article in English | MEDLINE | ID: mdl-29532572

ABSTRACT

RATIONALE, AIMS, AND OBJECTIVES: The spontaneous reporting system currently used in pharmacovigilance is not sufficiently exhaustive to detect all adverse drug reactions (ADRs). With the widespread use of electronic health records, biomedical data collected during the clinical care process can be reused and analysed to better detect ADRs. The aim of this study was to assess whether querying a Clinical Data Warehouse (CDW) could increase the detection of drug-induced anaphylaxis. METHODS: All known cases of drug-induced anaphylaxis that occurred or required hospitalization at Rennes Academic Hospital in 2011 (n = 19) were retrieved from the French pharmacovigilance database, which contains all reported ADR events. Then, from the Rennes Academic Hospital CDW, a training set (all patients hospitalized in 2011) and a test set (all patients hospitalized in 2012) were extracted. The training set was used to define an optimized query, by building a set of keywords (based on the known cases) and exclusion criteria to search structured and unstructured data within the CDW in order to identify at least all known cases of drug-induced anaphylaxis for 2011. Then, the real performance of the optimized query was tested in the test set. RESULTS: Using the optimized query, 59 cases of drug-induced anaphylaxis were identified among the 253 patient records extracted from the test set as possible anaphylaxis cases. Specifically, the optimal query identified 41 drug-induced anaphylaxis cases that were not detected by searching the French pharmacovigilance database but missed 7 cases detected only by spontaneous reporting. DISCUSSION: We proposed an information retrieval-based method for detecting drug-induced anaphylaxis, by querying structured and unstructured data in a CDW. CDW queries are less specific than spontaneous reporting and Diagnosis-related Groups queries, although their sensitivity is much higher. CDW queries can facilitate monitoring by pharmacovigilance experts. Our method could be easily incorporated in the routine practice.


Subject(s)
Anaphylaxis/chemically induced , Anaphylaxis/epidemiology , Drug-Related Side Effects and Adverse Reactions/diagnosis , Patient Safety , Adverse Drug Reaction Reporting Systems , Databases, Factual , France/epidemiology , Humans , Medical Errors , Pharmacovigilance
19.
Stud Health Technol Inform ; 235: 521-525, 2017.
Article in English | MEDLINE | ID: mdl-28423847

ABSTRACT

Technical medical terms are complicated to be correctly understood by non-experts. Vocabulary, associating technical terms with layman expressions, can help in increasing the readability of technical texts and their understanding. The purpose of our work is to build this kind of vocabulary. We propose to exploit the notion of reformulation following two methods: extraction of abbreviations and of reformulations with specific markers. The segments associated thanks to these methods are aligned with medical terminologies. Our results allow to cover over 9,000 medical terms and show precision of extractions between 0.24 and 0.98. The results and analyzed and compared with the existing work.


Subject(s)
Terminology as Topic , Vocabulary, Controlled , Comprehension
20.
Stud Health Technol Inform ; 216: 815-20, 2015.
Article in English | MEDLINE | ID: mdl-26262165

ABSTRACT

With the recent and intensive research in the biomedical area, the knowledge accumulated is disseminated through various knowledge bases. Links between these knowledge bases are needed in order to use them jointly. Linked Data, SPARQL language, and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. We propose a method for translating natural language questions in SPARQL queries. We use Natural Language Processing tools, semantic resources, and the RDF triples description. The method is designed on 50 questions over 3 biomedical knowledge bases, and evaluated on 27 questions. It achieves 0.78 F-measure on the test set. The method for translating natural language questions into SPARQL queries is implemented as Perl module available at http://search.cpan.org/ thhamon/RDF-NLP-SPARQLQuery.


Subject(s)
Information Storage and Retrieval , Natural Language Processing , Databases, Factual , Humans , Information Storage and Retrieval/methods , Semantics
SELECTION OF CITATIONS
SEARCH DETAIL
...