Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Biomed Inform ; 140: 104325, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36870586

RESUMEN

Monoclonal antibodies (MAs) are increasingly used in the therapeutic arsenal. Clinical Data Warehouses (CDWs) offer unprecedented opportunities for research on real-word data. The objective of this work is to develop a knowledge organization system on MAs for therapeutic use (MATUs) applicable in Europe to query CDWs from a multi-terminology server (HeTOP). After expert consensus, three main health thesauri were selected: the MeSH thesaurus, the National Cancer Institute thesaurus (NCIt) and the SNOMED CT. These thesauri contain 1,723 MAs concepts, but only 99 (5.7 %) are identified as MATUs. The knowledge organisation system proposed in this article is a six-level hierarchical system according to their main therapeutic target. It includes 193 different concepts organised in a cross lingual terminology server, which will allow the inclusion of semantic extensions. Ninety nine (51.3 %) MATUs concepts and 94 (48.7 %) hierarchical concepts composed the knowledge organisation system. Two separates groups (an expert group and a validation group) carried out the selection, creation and validation processes. Queries identify, for unstructured data, 83 out of 99 (83.8 %) MATUs corresponding to 45,262 patients, 347,035 hospital stays and 427,544 health documents, and for structured data, 61 out of 99 (61.6 %) MATUs corresponding to 9,218 patients, 59,643 hospital stays and 104,737 hospital prescriptions. The volume of data in the CDW demonstrated the potential for using these data in clinical research, although not all MATUs are present in the CDW (16 missing for unstructured data and 38 for structured data). The knowledge organisation system proposed here improves the understanding of MATUs, the quality of queries and helps clinical researchers retrieve relevant medical information. The use of this model in CDW allows for the rapid identification of a large number of patients and health documents, either directly by a MATU of interest (e.g. Rituximab) but also by searching for parent concepts (e.g. Anti-CD20 Monoclonal Antibody).


Asunto(s)
Anticuerpos Monoclonales , Vocabulario Controlado , Humanos , Anticuerpos Monoclonales/uso terapéutico , Systematized Nomenclature of Medicine , Data Warehousing , Europa (Continente)
2.
BMC Med Inform Decis Mak ; 22(1): 34, 2022 02 08.
Artículo en Inglés | MEDLINE | ID: mdl-35135538

RESUMEN

BACKGROUND: Unstructured data from electronic health records represent a wealth of information. Doc'EDS is a pre-screening tool based on textual and semantic analysis. The Doc'EDS system provides a graphic user interface to search documents in French. The aim of this study was to present the Doc'EDS tool and to provide a formal evaluation of its semantic features. METHODS: Doc'EDS is a search tool built on top of the clinical data warehouse developed at Rouen University Hospital. This tool is a multilevel search engine combining structured and unstructured data. It also provides basic analytical features and semantic utilities. A formal evaluation was conducted to measure the impact of Natural Language Processing algorithms. RESULTS: Approximately 18.1 million narrative documents are stored in Doc'EDS. The formal evaluation was conducted in 5000 clinical concepts that were manually collected. The F-measures of negative concepts and hypothetical concepts were respectively 0.89 and 0.57. CONCLUSION: In this formal evaluation, we have shown that Doc'EDS is able to deal with language subtleties to enhance an advanced full text search in French health documents. The Doc'EDS tool is currently used on a daily basis to help researchers to identify patient cohorts thanks to unstructured data.


Asunto(s)
Data Warehousing , Semántica , Registros Electrónicos de Salud , Humanos , Procesamiento de Lenguaje Natural , Motor de Búsqueda
4.
J Med Internet Res ; 16(12): e271, 2014 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-25448528

RESUMEN

BACKGROUND: PubMed contains numerous articles in languages other than English. However, existing solutions to access these articles in the language in which they were written remain unconvincing. OBJECTIVE: The aim of this study was to propose a practical search engine, called Multilingual PubMed, which will permit access to a PubMed subset in 1 language and to evaluate the precision and coverage for the French version (Multilingual PubMed-French). METHODS: To create this tool, translations of MeSH were enriched (eg, adding synonyms and translations in French) and integrated into a terminology portal. PubMed subsets in several European languages were also added to our database using a dedicated parser. The response time for the generic semantic search engine was evaluated for simple queries. BabelMeSH, Multilingual PubMed-French, and 3 different PubMed strategies were compared by searching for literature in French. Precision and coverage were measured for 20 randomly selected queries. The results were evaluated as relevant to title and abstract, the evaluator being blind to search strategy. RESULTS: More than 650,000 PubMed citations in French were integrated into the Multilingual PubMed-French information system. The response times were all below the threshold defined for usability (2 seconds). Two search strategies (Multilingual PubMed-French and 1 PubMed strategy) showed high precision (0.93 and 0.97, respectively), but coverage was 4 times higher for Multilingual PubMed-French. CONCLUSIONS: It is now possible to freely access biomedical literature using a practical search tool in French. This tool will be of particular interest for health professionals and other end users who do not read or query sufficiently in English. The information system is theoretically well suited to expand the approach to other European languages, such as German, Spanish, Norwegian, and Portuguese.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Lenguaje , PubMed/estadística & datos numéricos , Motor de Búsqueda/estadística & datos numéricos , Francia , Humanos , Medical Subject Headings
5.
JMIR Med Educ ; 10: e48393, 2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38437007

RESUMEN

BACKGROUND: Access to reliable and accurate digital health web-based resources is crucial. However, the lack of dedicated search engines for non-English languages, such as French, is a significant obstacle in this field. Thus, we developed and implemented a multilingual, multiterminology semantic search engine called Catalog and Index of Digital Health Teaching Resources (CIDHR). CIDHR is freely accessible to everyone, with a focus on French-speaking resources. CIDHR has been initiated to provide validated, high-quality content tailored to the specific needs of each user profile, be it students or professionals. OBJECTIVE: This study's primary aim in developing and implementing the CIDHR is to improve knowledge sharing and spreading in digital health and health informatics and expand the health-related educational community, primarily French speaking but also in other languages. We intend to support the continuous development of initial (ie, bachelor level), advanced (ie, master and doctoral levels), and continuing training (ie, professionals and postgraduate levels) in digital health for health and social work fields. The main objective is to describe the development and implementation of CIDHR. The hypothesis guiding this research is that controlled vocabularies dedicated to medical informatics and digital health, such as the Medical Informatics Multilingual Ontology (MIMO) and the concepts structuring the French National Referential on Digital Health (FNRDH), to index digital health teaching and learning resources, are effectively increasing the availability and accessibility of these resources to medical students and other health care professionals. METHODS: First, resource identification is processed by medical librarians from websites and scientific sources preselected and validated by domain experts and surveyed every week. Then, based on MIMO and FNRDH, the educational resources are indexed for each related knowledge domain. The same resources are also tagged with relevant academic and professional experience levels. Afterward, the indexed resources are shared with the digital health teaching and learning community. The last step consists of assessing CIDHR by obtaining informal feedback from users. RESULTS: Resource identification and evaluation processes were executed by a dedicated team of medical librarians, aiming to collect and curate an extensive collection of digital health teaching and learning resources. The resources that successfully passed the evaluation process were promptly included in CIDHR. These resources were diligently indexed (with MIMO and FNRDH) and tagged for the study field and degree level. By October 2023, a total of 371 indexed resources were available on a dedicated portal. CONCLUSIONS: CIDHR is a multilingual digital health education semantic search engine and platform that aims to increase the accessibility of educational resources to the broader health care-related community. It focuses on making resources "findable," "accessible," "interoperable," and "reusable" by using a one-stop shop portal approach. CIDHR has and will have an essential role in increasing digital health literacy.


Asunto(s)
Salud Digital , Semántica , Humanos , Motor de Búsqueda , Lenguaje , Aprendizaje
6.
Int J Med Inform ; 170: 104976, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36599261

RESUMEN

INTRODUCTION: The cytochrome P450 (CYP450) enzyme system is involved in the metabolism of certain drugs and is responsible for most drug interactions. These interactions result in either an enzymatic inhibition or an enzymatic induction mechanism that has an impact on the therapeutic management of patients. Detecting these drug interactions will allow for better predictability in therapeutic response. Therefore, computerized solutions can represent a valuable help for clinicians in their tasks of detection. OBJECTIVE: The objective of this study is to provide a structured data-source of interactions involving the CYP450 enzyme system. These interactions are aimed to be integrated in the cross-lingual multi-terminology server HeTOP (Health Terminologies and Ontologies Portal), to support the query processing of the clinical data warehouse (CDW) EDSaN (Entrepôt de Données de Santé Normand). MATERIAL AND METHODS: A selection and curation of drug components (DCs) that share a relationship with the CYP450 system was performed from several international data sources. The DCs were linked according to the type of relationship which can be substrate, inhibitor, or inducer. These relationships were then integrated into the HeTOP server. To validate the CYP450 relationships, a semantic query was performed on the CDW, whose search engine is founded on HeTOP data (concepts, terms, and relations). RESULTS: A total of 776 DCs are associated by a new interaction relationship, integrated in HeTOP, by 14 enzymes. These are CYP450 1A2, 2A6, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 3A4, 3A7, 11B1,11B2 mitochondrial and P-glycoprotein, constituting a total of 2,088 relationships. A general modelling of cytochromic interactions was performed. From this model, 233,006 queries were processed in less than two hours, demonstrating the usefulness and performance of our CDW implementation. Moreover, they showed that in our university hospital, the concurrent prescription that could cause a cytochromic interaction is Bisoprolol with Amiodarone by enzymatic inhibition for 2,493 patients. DISCUSSION: The queries submitted to the CDW EDSaN allowed to highlight the most prescribed molecules simultaneously and potentially responsible for cytochromic interactions. In a second step, it would be interesting to evaluate the real clinical impact by looking for possible adverse effects of these interactions in the patients' files. Other computational solutions for cytochromic interactions exist. The impact of CYP450 is particularly important for drugs with narrow therapeutic window (NTW) as they can lead to increased toxicity or therapeutic failure. It is also important to define which drug component is a pro-drug and to considerate the many genetic polymorphisms of patients. CONCLUSION: The HeTOP server contains a non-negligible number of relationships between drug components and CYP450 from multiple reference sources. These data allow us to query our Clinical Data Warehouse to highlight these cytochromic interactions. It would be interesting in the future to assess the actual clinical impact in hospital reports.


Asunto(s)
Sistema Enzimático del Citocromo P-450 , Data Warehousing , Humanos , Sistema Enzimático del Citocromo P-450/genética , Sistema Enzimático del Citocromo P-450/metabolismo
7.
Stud Health Technol Inform ; 180: 949-53, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22874333

RESUMEN

UNLABELLED: The Health Terminology/Ontology Portal (HeTOP) was developed to provide easy access to health terminologies and ontologie. The repository is not only dedicated to professionals but is also a valuable teaching tool. Currently, it provides access to thirty two health terminologies and ontologies available mainly in French or in English, but also in German, Italian, Chinese, etc. HeTOP can be used by both humans and computers via Web services. To integrate new resources into HeTOP, three steps are necessary: (1) designing a meta-model into which each terminology (or ontology) can be integrated, (2) developing a process to include terminologies into HeTOP, (3) building and integrating existing and new inter & intra-terminology semantic harmonization into HeTOP. Currently, 600 unique machines use the MeSH version of HeTOP every day and restricted terminologies/ontologies are used for teaching purposes in several medical schools in France. The multilingual version of HeTOP is available (URL: http://hetop.eu/) and provides free access to ICD10 and FMA in ten languages. CONCLUSION: HeTOP is a rich tool, useful for a wide range of applications and users, especially in education and resource indexing but also in information retrieval or performing audits in terminology management.


Asunto(s)
Instrucción por Computador/métodos , Educación Médica/métodos , Internet , Terminología como Asunto , Vocabulario Controlado , Europa (Continente)
8.
Stud Health Technol Inform ; 294: 38-42, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612012

RESUMEN

The frequency of potential drug-drug interactions (DDI) in published studies on real world data considerably varies due to the methodological framework. Contextualization of DDI has a proven effect in limiting false positives. In this paper, we experimented with the application of various DDIs contexts elements to see their impact on the frequency of potential DDIs measured on the same set of prescription data collected in EDSaN, the clinical data warehouse of Rouen University Hospital. Depending on the context applied, the frequency of daily prescriptions with potential DDI ranged from 0.89% to 3.90%. Substance-level analysis accounted for 48% of false positives because it did not account for some drug-related attributes. Consideration of the patient's context could eliminate up to an additional 29% of false positives.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Interacciones Farmacológicas , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/prevención & control , Humanos
9.
Stud Health Technol Inform ; 289: 260-263, 2022 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-35062142

RESUMEN

The Normandy health data warehouse EDSaN integrates the medication orders from the University Hospital of Rouen (France). This study aims at describing the design and the evaluation of an information retrieval system founded on a complex and semantically augmented knowledge graph dedicated to EDSaN drugs' prescriptions. The system is intended to help the selection of drugs in the search process by health professionals. The manual evaluation of the relevance of the returned drugs showed encouraging results as expected. A deeper analysis in order to improve the ranking method is needed and will be performed in a future work.


Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Preparaciones Farmacéuticas , Francia , Humanos , Almacenamiento y Recuperación de la Información , Conocimiento
10.
Stud Health Technol Inform ; 298: 19-23, 2022 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-36073449

RESUMEN

The aim of this paper is to present the use of Medical Informatics Multilingual Ontology (MIMO) to index digital health resources that are (and will be) included in SaNuRN (project to teach digital health). MIMO currently contains 1,379 concepts and is integrated into HeTOP, which is a cross-lingual multiterminogy server. Existing teaching resources have been reindexed with MIMO concepts and integrated into a dedicated website. A total of 345 resources have been indexed with MIMO concepts and are freely available at https://doccismef.chu-rouen.fr/dc/#env=sanurn. The development of a multilingual MIMO for enhancing the quality and the efficiency of international projects is challenging. A specific semantic search engine has been deployed to give access to digital health teaching resources.


Asunto(s)
Informática Médica , Multilingüismo , Motor de Búsqueda , Semántica
11.
BMC Med Inform Decis Mak ; 11: 3, 2011 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-21255439

RESUMEN

BACKGROUND: General practitioners and medical specialists mainly rely on one "general medical" journal to keep their medical knowledge up to date. Nevertheless, it is not known if these journals display the same overview of the medical knowledge in different specialties. The aims of this study were to measure the relative weight of the different specialties in the major journals of general medicine, to evaluate the trends in these weights over a ten-year period and to compare the journals. METHODS: The 14,091 articles published in The Lancet, the NEJM, the JAMA and the BMJ in 1997, 2002 and 2007 were analyzed. The relative weight of the medical specialities was determined by categorization of all the articles, using a categorization algorithm which inferred the medical specialties relevant to each article MEDLINE file from the MeSH terms used by the indexers of the US National Library of Medicine to describe each article. RESULTS: The 14,091 articles included in our study were indexed by 22,155 major MeSH terms, which were categorized into 81 different medical specialties. Cardiology and Neurology were in the first 3 specialties in the 4 journals. Five and 15 specialties were systematically ranked in the first 10 and first 20 in the four journals respectively. Among the first 30 specialties, 23 were common to the four journals. For each speciality, the trends over a 10-year period were different from one journal to another, with no consistency and no obvious explanatory factor. CONCLUSIONS: Overall, the representation of many specialties in the four journals in general and internal medicine included in this study may differ, probably due to different editorial policies. Reading only one of these journals may provide a reliable but only partial overview.


Asunto(s)
Bibliometría , Medicina/estadística & datos numéricos , Publicaciones Periódicas como Asunto/estadística & datos numéricos , Especialidades Quirúrgicas/estadística & datos numéricos , Estados Unidos
12.
Stud Health Technol Inform ; 166: 129-38, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21685618

RESUMEN

Since the mid-90s, several quality-controlled health gateways were developed. In France, CISMeF is the leading health gateway. It indexes Internet resources from the main institutions, using the MeSH thesaurus and the Dublin Core metadata element set. Since 2005, the CISMeF Information System (IS) includes 24 health terminologies, classifications and thesauri for indexing and information retrieval. This work aims at creating a Health Multi-Terminology Portal (HMTP) and connect it to the CISMeF Terminology Database mainly for searching concepts and terms among all the health controlled vocabularies available in French (or in English and translated in French) and browsing it dynamically. To integrate the terminologies in the CISMeF IS, three steps are necessary: (1) designing a meta-model into which each terminology can be integrated, (2) developing a process to include terminologies into the HMTP, (3) building and integrating existing and new inter-terminology mappings into the HMTP. A total of 24 terminologies are included in the HMTP, with 575,300 concepts, 852,000 synonyms, 222,800 definitions and 1,180,000 relations. Heightteen of these terminologies are not included yet in the UMLS among them, some from the World Health Organization. Since January 2010, HMTP is daily used by CISMeF librarians to index in multi-terminology mode. A health multiterminology portal is a valuable tool helping the indexing and the retrieval of resources from a quality-controlled patient safety gateway. It can also be very useful for teaching or performing audits in terminology management.


Asunto(s)
Documentación/métodos , Almacenamiento y Recuperación de la Información/métodos , Administración de la Seguridad/organización & administración , Semántica , Terminología como Asunto , Administración Hospitalaria , Humanos , Internet
13.
Stud Health Technol Inform ; 169: 492-6, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21893798

RESUMEN

BACKGROUND: Following a recent change in the indexing policy for French quality controlled health gateway CISMeF, multiple terminologies are now being used for indexing in addition to MeSH®. OBJECTIVE: To evaluate precision and recall of super-concepts for information retrieval in a multi-terminology paradigm compared to MeSH-only. METHODS: We evaluate the relevance of resources retrieved by multi-terminology super-concepts and MeSH-only super-concepts queries. RESULTS: Recall was 8-14% higher for multi-terminology super-concepts compared to MeSH only super-concepts. Precision decreased from 0.66 for MeSH only super-concepts to 0.61 for multi-terminology super-concepts. Retrieval performance was found to vary significantly depending on the super-concepts (p<10-4) and indexing methods (manual vs automatic; p<0.004). CONCLUSION: A multi-terminology paradigm contributes to increase recall but lowers precision. Automated tools for indexing are not accurate enough to allow a very precise information retrieval.


Asunto(s)
Indización y Redacción de Resúmenes , Almacenamiento y Recuperación de la Información/métodos , Informática Médica/métodos , Algoritmos , Catálogos como Asunto , Procesamiento Automatizado de Datos , Humanos , Internet , Medical Subject Headings , Reproducibilidad de los Resultados , Programas Informáticos , Estadística como Asunto , Terminología como Asunto
14.
Stud Health Technol Inform ; 160(Pt 1): 610-4, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-20841759

RESUMEN

The lack of interoperability between repositories of heterogeneous and geographically widespread data is an obstacle to the diffusion, sharing and reutilization of those data. We present the development of an open repositories network taking into account both the syntactic and semantic interoperability of the different repositories and based on international standards in this field. The network is used by the medical community in France for the diffusion and sharing of digital teaching resources. The syntactic interoperability of the repositories is managed using the OAI-PMH protocol for the exchange of metadata describing the resources. Semantic interoperability is based, on one hand, on the LOM standard for the description of resources and on MESH for the indexing of the latter and, on the other hand, on semantic interoperability management designed to optimize compliance with standards and the quality of the metadata.


Asunto(s)
Instrucción por Computador/métodos , Educación Médica/organización & administración , Difusión de la Información/métodos , Internet/organización & administración , Interfaz Usuario-Computador , Francia
15.
Med Teach ; 31(4): e162-8, 2009 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-19404888

RESUMEN

BACKGROUND: At the end of undergraduate medical curriculum, a written simulation-based examination is used in France to assess therapeutic decision-making skills and to rank students for the purpose of matching their training specialties. However, this examination based on a single assessment method remains a subject of debate. AIM: To study the feasibility of a web-based Concordance test for therapeutic decision-making assessment. METHODS: A 12 clinical-case Concordance test was developed based on objectives for the undergraduate training program. The test was administered on line to candidates with different levels of clinical experience. Fifteen therapeutic teachers constituted the reference panel. Data analysis included analysis of variance, post-hoc test, and Cronbach's alpha. RESULTS: One hundred and seventy participants (113 students, 34 residents, 23 physicians) fully completed the free-access test on line with no technical problems. Differences between the mean scores for groups were significant (p < 0.001). Significant differences occurred between fourth year students and residents (p < 0.001), fourth year students and physicians (p = 0.001). No difference was found between residents and physicians. Reliability coefficient was 0.67. CONCLUSION: A web-based Concordance test in the field of therapeutic decision-making was considered feasible in a French learning environment. Further research is warranted to determine its usefulness as a part of the National Examination.


Asunto(s)
Toma de Decisiones , Evaluación Educacional/métodos , Internet , Terapéutica , Competencia Clínica , Educación de Pregrado en Medicina , Medicina Basada en la Evidencia , Estudios de Factibilidad , Francia , Humanos , Médicos , Aprendizaje Basado en Problemas
16.
Stud Health Technol Inform ; 148: 112-22, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19745241

RESUMEN

OBJECTIVE: The objective of this work is to create a bilingual (French/English) Drug Information Portal (DIP), in a multi-terminological context and to emphasize its exploitation by an ATC automatic indexing allowing having more pertinent information about substances, organs or systems on which drugs act and their therapeutic and chemical characteristics. METHODS: The development of the DIP was based on the CISMeF portal, which catalogues and indexes the most important and quality-controlled sources of institutional health information in French. DIP has created specific functionalities and uses specific drugs terminologies such as the ATC classification which used to automatic index the DIP resources. RESULTS: DIP is the result of collaboration between the CISMeF team and the VIDAL Company, specialized in drug information. DIP is conceived to facilitate the user information retrieval. The ATC automatic indexing provided relevant results in 76% of cases. CONCLUSION: Using multi-terminological context and in the framework of the drug field, indexing drugs with the appropriate codes or/and terms revealed to be very important to have the appropriate information storage and retrieval. The main challenge in the coming year is to increase the accuracy of the approach.


Asunto(s)
Indización y Redacción de Resúmenes , Automatización , Servicios de Información sobre Medicamentos , Preparaciones Farmacéuticas , Europa (Continente) , Francia , Internet
17.
Stud Health Technol Inform ; 150: 497-501, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19745361

RESUMEN

The objective of this work is the creation of a bilingual (French/English) drug information portal (DIP), in a multi-terminological context. The development of the DIP was based on the CISMeF portal, which catalogues and indexes the most important and quality-controlled sources of institutional health information in French. DIP has created specific functionalities related to drug and used specific drugs terminologies and classifications: the ATC classification, the CAS numbers, the French codes CIS, and CIP, as well as trade names and the International Nonproprietary Names of the drugs. DIP is the result of collaboration between the CISMeF team and the VIDAL private Company, specialized in drug information. DIP is conceived to facilitate the user information retrieval using several health terminologies. In the framework of the drug field, using multi-terminological context, indexing drugs with the appropriate codes or/and terms revealed to be very important to have the appropriate information storage and retrieval.


Asunto(s)
Almacenamiento y Recuperación de la Información , Internet , Preparaciones Farmacéuticas , Terminología como Asunto , Europa (Continente) , Humanos , Administración de la Seguridad
18.
JMIR Med Inform ; 7(3): e12310, 2019 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-31359873

RESUMEN

BACKGROUND: Word embedding technologies, a set of language modeling and feature learning techniques in natural language processing (NLP), are now used in a wide range of applications. However, no formal evaluation and comparison have been made on the ability of each of the 3 current most famous unsupervised implementations (Word2Vec, GloVe, and FastText) to keep track of the semantic similarities existing between words, when trained on the same dataset. OBJECTIVE: The aim of this study was to compare embedding methods trained on a corpus of French health-related documents produced in a professional context. The best method will then help us develop a new semantic annotator. METHODS: Unsupervised embedding models have been trained on 641,279 documents originating from the Rouen University Hospital. These data are not structured and cover a wide range of documents produced in a clinical setting (discharge summary, procedure reports, and prescriptions). In total, 4 rated evaluation tasks were defined (cosine similarity, odd one, analogy-based operations, and human formal evaluation) and applied on each model, as well as embedding visualization. RESULTS: Word2Vec had the highest score on 3 out of 4 rated tasks (analogy-based operations, odd one similarity, and human validation), particularly regarding the skip-gram architecture. CONCLUSIONS: Although this implementation had the best rate for semantic properties conservation, each model has its own qualities and defects, such as the training time, which is very short for GloVe, or morphological similarity conservation observed with FastText. Models and test sets produced by this study will be the first to be publicly available through a graphical interface to help advance the French biomedical research.

19.
Stud Health Technol Inform ; 264: 118-122, 2019 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-31437897

RESUMEN

Structuring raw medical documents with ontology mapping is now the next step for medical intelligence. Deep learning models take as input mathematically embedded information, such as encoded texts. To do so, word embedding methods can represent every word from a text as a fixed-length vector. A formal evaluation of three word embedding methods has been performed on raw medical documents. The data corresponds to more than 12M diverse documents produced in the Rouen hospital (drug prescriptions, discharge and surgery summaries, inter-services letters, etc.). Automatic and manual validation demonstrates that Word2Vec based on the skip-gram architecture had the best rate on three out of four accuracy tests. This model will now be used as the first layer of an AI-based semantic annotator.


Asunto(s)
Lenguaje , Procesamiento de Lenguaje Natural , Aprendizaje Profundo , Semántica
20.
JMIR Res Protoc ; 8(5): e11448, 2019 May 07.
Artículo en Inglés | MEDLINE | ID: mdl-31066711

RESUMEN

BACKGROUND: Social media is a potential source of information on postmarketing drug safety surveillance that still remains unexploited nowadays. Information technology solutions aiming at extracting adverse reactions (ADRs) from posts on health forums require a rigorous evaluation methodology if their results are to be used to make decisions. First, a gold standard, consisting of manual annotations of the ADR by human experts from the corpus extracted from social media, must be implemented and its quality must be assessed. Second, as for clinical research protocols, the sample size must rely on statistical arguments. Finally, the extraction methods must target the relation between the drug and the disease (which might be either treated or caused by the drug) rather than simple co-occurrences in the posts. OBJECTIVE: We propose a standardized protocol for the evaluation of a software extracting ADRs from the messages on health forums. The study is conducted as part of the Adverse Drug Reactions from Patient Reports in Social Media project. METHODS: Messages from French health forums were extracted. Entity recognition was based on Racine Pharma lexicon for drugs and Medical Dictionary for Regulatory Activities terminology for potential adverse events (AEs). Natural language processing-based techniques automated the ADR information extraction (relation between the drug and AE entities). The corpus of evaluation was a random sample of the messages containing drugs and/or AE concepts corresponding to recent pharmacovigilance alerts. A total of 2 persons experienced in medical terminology manually annotated the corpus, thus creating the gold standard, according to an annotator guideline. We will evaluate our tool against the gold standard with recall, precision, and f-measure. Interannotator agreement, reflecting gold standard quality, will be evaluated with hierarchical kappa. Granularities in the terminologies will be further explored. RESULTS: Necessary and sufficient sample size was calculated to ensure statistical confidence in the assessed results. As we expected a global recall of 0.5, we needed at least 384 identified ADR concepts to obtain a 95% CI with a total width of 0.10 around 0.5. The automated ADR information extraction in the corpus for evaluation is already finished. The 2 annotators already completed the annotation process. The analysis of the performance of the ADR information extraction module as compared with gold standard is ongoing. CONCLUSIONS: This protocol is based on the standardized statistical methods from clinical research to create the corpus, thus ensuring the necessary statistical power of the assessed results. Such evaluation methodology is required to make the ADR information extraction software useful for postmarketing drug safety surveillance. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR1-10.2196/11448.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA