Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Stud Health Technol Inform ; 316: 601-605, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176814

RESUMEN

Generative Large Language Models (LLMs) have become ubiquitous in various fields, including healthcare and medicine. Consequently, there is growing interest in leveraging LLMs for medical applications, leading to the emergence of novel models daily. However, evaluation and benchmarking frameworks for LLMs are scarce, particularly those tailored for medical French. To address this gap, we introduce a minimal benchmark consisting of 114 open questions designed to assess the medical capabilities of LLMs in French. The proposed benchmark encompasses a wide range of medical domains, reflecting real-world clinical scenarios' complexity. A preliminary validation involved testing seven widely used LLMs with a parameter size of 7 billion. Results revealed significant variability in performance, emphasizing the importance of rigorous evaluation before deploying LLMs in medical settings. In conclusion, we present a novel and valuable resource for rapidly evaluating LLMs in medical French. By promoting greater accountability and standardization, this benchmark has the potential to enhance trustworthiness and utility in harnessing LLMs for medical applications.


Asunto(s)
Benchmarking , Simulación por Computador , Francia
2.
Stud Health Technol Inform ; 316: 1647-1651, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176526

RESUMEN

Similarity and clustering tasks based on data extracted from electronic health records on the patient level suffer from the curse of dimensionality and the lack of inter-patient data comparability. Indeed, for many health institutions, there are many more variables, and ways of expressing those variables to represent patients than patients sharing the same set of data. To lower redundancy and increase interoperability one strategy is to map data to semantic-driven representations through medical knowledge graphs such as SNOMED-CT. However, patient similarity metrics based on this knowledge-graph information lack quantitative evaluation and comparisons with pure data-driven methods. The reasons are twofold, firstly, it is hard to conceptually assess and formalize a gold-standard similarity between patients resulting in poor inter-annotator agreement in qualitative evaluations. Secondly, the community has been lacking a clear benchmark to compare existing metrics developed by scientific communities coming from various fields such as ontology, data science, and medical informatics. This study proposes to leverage the known challenges of evaluating patient similarities by proposing SIMpat, a synthetic benchmark to quantitatively evaluate available metrics, based on controlled cohorts, which could later be used to assess their sensibility regarding aspects such as the sparsity of variables or specificities of patient disease patterns.


Asunto(s)
Benchmarking , Registros Electrónicos de Salud , Humanos , Systematized Nomenclature of Medicine , Semántica
3.
Stud Health Technol Inform ; 316: 1363-1367, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176634

RESUMEN

Representing numeric values such as scalars holds great importance for accurately depicting clinical data. While the result value itself will always be represented using an integer, decimal, or other scalar format, it needs to be linked to its corresponding data element. In SNOMED CT, as in most other terminology systems, this is done through an attribute relationship. While some scalar values are already included in this way, they only represent a small fraction of possibilities. Our intention is to expand the scope of scalar representation by validating new attributes using a previously established method. The result is a list of five attributes validated for local representation of scalar values, improving semantic representation and interoperability.


Asunto(s)
Semántica , Systematized Nomenclature of Medicine , Humanos , Registros Electrónicos de Salud , Terminología como Asunto
4.
Stud Health Technol Inform ; 316: 214-215, 2024 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-39176711

RESUMEN

Automatic extraction of body-text within clinical PDF documents is necessary to enhance downstream NLP tasks but remains a challenge. This study presents an unsupervised algorithm designed to extract body-text leveraging large volume of data. Using DBSCAN clustering over aggregate pages, our method extracts and organize text blocks using their content and coordinates. Evaluation results demonstrate precision scores ranging from 0.82 to 0.98, recall scores from 0.62 to 0.94, and F1-scores from 0.71 to 0.96 across various medical specialty sources. Future work includes dynamic parameter adjustments for improved accuracy and using larger datasets.


Asunto(s)
Procesamiento de Lenguaje Natural , Algoritmos , Minería de Datos/métodos , Humanos , Registros Electrónicos de Salud , Aprendizaje Automático no Supervisado
5.
Sci Data ; 11(1): 455, 2024 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-38704422

RESUMEN

Due to the complexity of the biomedical domain, the ability to capture semantically meaningful representations of terms in context is a long-standing challenge. Despite important progress in the past years, no evaluation benchmark has been developed to evaluate how well language models represent biomedical concepts according to their corresponding context. Inspired by the Word-in-Context (WiC) benchmark, in which word sense disambiguation is reformulated as a binary classification task, we propose a novel dataset, BioWiC, to evaluate the ability of language models to encode biomedical terms in context. BioWiC comprises 20'156 instances, covering over 7'400 unique biomedical terms, making it the largest WiC dataset in the biomedical domain. We evaluate BioWiC both intrinsically and extrinsically and show that it could be used as a reliable benchmark for evaluating context-dependent embeddings in biomedical corpora. In addition, we conduct several experiments using a variety of discriminative and generative large language models to establish robust baselines that can serve as a foundation for future research.


Asunto(s)
Procesamiento de Lenguaje Natural , Semántica , Lenguaje
6.
JMIR Med Inform ; 11: e44639, 2023 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-38015588

RESUMEN

BACKGROUND: Information overflow, a common problem in the present clinical environment, can be mitigated by summarizing clinical data. Although there are several solutions for clinical summarization, there is a lack of a complete overview of the research relevant to this field. OBJECTIVE: This study aims to identify state-of-the-art solutions for clinical summarization, to analyze their capabilities, and to identify their properties. METHODS: A scoping review of articles published between 2005 and 2022 was conducted. With a clinical focus, PubMed and Web of Science were queried to find an initial set of reports, later extended by articles found through a chain of citations. The included reports were analyzed to answer the questions of where, what, and how medical information is summarized; whether summarization conserves temporality, uncertainty, and medical pertinence; and how the propositions are evaluated and deployed. To answer how information is summarized, methods were compared through a new framework "collect-synthesize-communicate" referring to information gathering from data, its synthesis, and communication to the end user. RESULTS: Overall, 128 articles were included, representing various medical fields. Exclusively structured data were used as input in 46.1% (59/128) of papers, text in 41.4% (53/128) of articles, and both in 10.2% (13/128) of papers. Using the proposed framework, 42.2% (54/128) of the records contributed to information collection, 27.3% (35/128) contributed to information synthesis, and 46.1% (59/128) presented solutions for summary communication. Numerous summarization approaches have been presented, including extractive (n=13) and abstractive summarization (n=19); topic modeling (n=5); summary specification (n=11); concept and relation extraction (n=30); visual design considerations (n=59); and complete pipelines (n=7) using information extraction, synthesis, and communication. Graphical displays (n=53), short texts (n=41), static reports (n=7), and problem-oriented views (n=7) were the most common types in terms of summary communication. Although temporality and uncertainty information were usually not conserved in most studies (74/128, 57.8% and 113/128, 88.3%, respectively), some studies presented solutions to treat this information. Overall, 115 (89.8%) articles showed results of an evaluation, and methods included evaluations with human participants (median 15, IQR 24 participants): measurements in experiments with human participants (n=31), real situations (n=8), and usability studies (n=28). Methods without human involvement included intrinsic evaluation (n=24), performance on a proxy (n=10), or domain-specific tasks (n=11). Overall, 11 (8.6%) reports described a system deployed in clinical settings. CONCLUSIONS: The scientific literature contains many propositions for summarizing patient information but reports very few comparisons of these proposals. This work proposes to compare these algorithms through how they conserve essential aspects of clinical information and through the "collect-synthesize-communicate" framework. We found that current propositions usually address these 3 steps only partially. Moreover, they conserve and use temporality, uncertainty, and pertinent medical aspects to varying extents, and solutions are often preliminary.

7.
Sci Rep ; 13(1): 6013, 2023 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-37045983

RESUMEN

Two successive COVID-19 flares occurred in Switzerland in spring and autumn 2020. During these periods, therapeutic strategies have been constantly adapted based on emerging evidence. We aimed to describe these adaptations and evaluate their association with patient outcomes in a cohort of COVID-19 patients admitted to the hospital. Consecutive patients admitted to the Geneva Hospitals during two successive COVID-19 flares were included. Characteristics of patients admitted during these two periods were compared as well as therapeutic management including medications, respiratory support strategies and admission to the ICU and intermediate care unit (IMCU). A mutivariable model was computed to compare outcomes across the two successive waves adjusted for demographic characteristics, co-morbidities and severity at baseline. The main outcome was in-hospital mortality. Secondary outcomes included ICU admission, Intermediate care (IMCU) admission, and length of hospital stay. A total of 2'983 patients were included. Of these, 165 patients (16.3%, n = 1014) died during the first wave and 314 (16.0%, n = 1969) during the second (p = 0.819). The proportion of patients admitted to the ICU was lower in second wave compared to first (7.4 vs. 13.9%, p < 0.001) but their mortality was increased (33.6% vs. 25.5%, p < 0.001). Conversely, a greater proportion of patients was admitted to the IMCU in second wave compared to first (26.6% vs. 22.3%, p = 0.011). A third of patients received lopinavir (30.7%) or hydroxychloroquine (33.1%) during the first wave and none during second wave, while corticosteroids were mainly prescribed during second wave (58.1% vs. 9.1%, p < 0.001). In the multivariable analysis, a 25% reduction of mortality was observed during the second wave (HR 0.75; 95% confidence interval 0.59 to 0.96). Among deceased patients, 82.3% (78.2% during first wave and 84.4% during second wave) died without beeing admitted to the ICU. The proportion of patients with therapeutic limitations regarding ICU admission increased during the second wave (48.6% vs. 38.7%, p < 0.001). Adaptation of therapeutic strategies including corticosteroids therapy and higher admission to the IMCU to receive non-invasive respiratory support was associated with a reduction of hospital mortality in multivariable analysis, ICU admission and LOS during the second wave of COVID-19 despite an increased number of admitted patients. More patients had medical decisions restraining ICU admission during the second wave which may reflect better patient selection or implicit triaging.


Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , COVID-19/terapia , Centros de Atención Terciaria , Suiza/epidemiología , Hospitalización , Tiempo de Internación , Unidades de Cuidados Intensivos , Mortalidad Hospitalaria , Estudios Retrospectivos
8.
Exp Clin Endocrinol Diabetes ; 131(6): 338-344, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37015329

RESUMEN

BACKGROUND: Hyperglycaemia is associated with worse outcomes in many settings. However, the association between dysglycaemia and adverse outcomes remains debated in COVID-19 patients. This study determined the association of prehospital blood glucose levels with acute medical unit (intensive care unit or high dependency unit) admission and mortality among COVID-19-infected patients. METHODS: This was a single-centre, retrospective cohort study based on patients cared for by the prehospital medical mobile unit from a Swiss university hospital between March 2020 and April 2021. All adult patients with confirmed or suspected COVID-19 infection during the study period were included. Data were obtained from the prehospital medical files. The main exposure was prehospital blood glucose level. A 7.8 mmol/L cut-off was used to define high blood glucose level. Restricted cubic splines were also used to analyse the exposure as a continuous variable. The primary endpoint was acute medical unit admission; secondary endpoints were 7-day and 30-day mortality. Multivariable logistic regressions were performed to compute odds ratios. RESULTS: A total of 276 patients were included. The mean prehospital blood glucose level was 8.8 mmol/l, and 123 patients presented high blood glucose levels. The overall acute medical unit admission rate was 31.2%, with no statistically significant difference according to prehospital blood glucose levels. The mortality rate was 13.8% at 7 days and 25% at 30 days. The 30-day mortality rate was higher in patients with high prehospital blood glucose levels, with an adjusted odds ratio of 2.5 (1.3-4.8). CONCLUSIONS: In patients with acute COVID-19 infection, prehospital blood glucose levels do not seem to be associated with acute medical unit admission. However, there was an increased risk of 30-day mortality in COVID-19 patients who presented high prehospital blood glucose levels.


Asunto(s)
COVID-19 , Servicios Médicos de Urgencia , Hiperglucemia , Adulto , Humanos , COVID-19/complicaciones , Glucemia/análisis , Estudios Retrospectivos , Hiperglucemia/epidemiología
9.
JMIR Med Inform ; 10(8): e41257, 2022 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-35944251

RESUMEN

[This corrects the article DOI: 10.2196/29174.].

10.
Sci Rep ; 12(1): 14677, 2022 08 29.
Artículo en Inglés | MEDLINE | ID: mdl-36038578

RESUMEN

Abdominal pain and liver injury have been frequently reported during coronavirus disease-2019 (COVID-19). Our aim was to investigate characteristics of abdominal pain in COVID-19 patients and their association with disease severity and liver injury.Data of all COVID-19 patients hospitalized during the first wave in one hospital were retrieved. Patients admitted exclusively for other pathologies and/or recovered from COVID-19, as well as pregnant women were excluded. Patients whose abdominal pain was related to alternative diagnosis were also excluded.Among the 1026 included patients, 200 (19.5%) exhibited spontaneous abdominal pain and 165 (16.2%) after abdomen palpation. Spontaneous pain was most frequently localized in the epigastric (42.7%) and right upper quadrant (25.5%) regions. Tenderness in the right upper region was associated with severe COVID-19 (hospital mortality and/or admission to intensive/intermediate care unit) with an adjusted odds ratio of 2.81 (95% CI 1.27-6.21, p = 0.010). Patients with history of lower abdomen pain experimented less frequently dyspnea compared to patients with history of upper abdominal pain (25.8 versus 63.0%, p < 0.001). Baseline transaminases elevation was associated with history of pain in epigastric and right upper region and AST elevation was strongly associated with severe COVID-19 with an odds ratio of 16.03 (95% CI 1.95-131.63 p = 0.010).More than one fifth of patients admitted for COVID-19 presented abdominal pain. Those with pain located in the upper abdomen were more at risk of dyspnea, demonstrated more altered transaminases, and presented a higher risk of adverse outcomes.


Asunto(s)
COVID-19 , Abdomen , Dolor Abdominal/etiología , COVID-19/complicaciones , Disnea , Femenino , Humanos , Embarazo , Estudios Retrospectivos , SARS-CoV-2 , Transaminasas
11.
BMJ Open Respir Res ; 9(1)2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-36002181

RESUMEN

BACKGROUND: The SARS-CoV-2 pandemic led to a steep increase in hospital and intensive care unit (ICU) admissions for acute respiratory failure worldwide. Early identification of patients at risk of clinical deterioration is crucial in terms of appropriate care delivery and resource allocation. We aimed to evaluate and compare the prognostic performance of Sequential Organ Failure Assessment (SOFA), Quick Sequential Organ Failure Assessment (qSOFA), Confusion, Uraemia, Respiratory Rate, Blood Pressure and Age ≥65 (CURB-65), Respiratory Rate and Oxygenation (ROX) index and Coronavirus Clinical Characterisation Consortium (4C) score to predict death and ICU admission among patients admitted to the hospital for acute COVID-19 infection. METHODS AND ANALYSIS: Consecutive adult patients admitted to the Geneva University Hospitals during two successive COVID-19 flares in spring and autumn 2020 were included. Discriminative performance of these prediction rules, obtained during the first 24 hours of hospital admission, were computed to predict death or ICU admission. We further exluded patients with therapeutic limitations and reported areas under the curve (AUCs) for 30-day mortality and ICU admission in sensitivity analyses. RESULTS: A total of 2122 patients were included. 216 patients (10.2%) required ICU admission and 303 (14.3%) died within 30 days post admission. 4C score had the best discriminatory performance to predict 30-day mortality (AUC 0.82, 95% CI 0.80 to 0.85), compared with SOFA (AUC 0.75, 95% CI 0.72 to 0.78), qSOFA (AUC 0.59, 95% CI 0.56 to 0.62), CURB-65 (AUC 0.75, 95% CI 0.72 to 0.78) and ROX index (AUC 0.68, 95% CI 0.65 to 0.72). ROX index had the greatest discriminatory performance (AUC 0.79, 95% CI 0.76 to 0.83) to predict ICU admission compared with 4C score (AUC 0.62, 95% CI 0.59 to 0.66), CURB-65 (AUC 0.60, 95% CI 0.56 to 0.64), SOFA (AUC 0.74, 95% CI 0.71 to 0.77) and qSOFA (AUC 0.59, 95% CI 0.55 to 0.62). CONCLUSION: Scores including age and/or comorbidities (4C and CURB-65) have the best discriminatory performance to predict mortality among inpatients with COVID-19, while scores including quantitative assessment of hypoxaemia (SOFA and ROX index) perform best to predict ICU admission. Exclusion of patients with therapeutic limitations improved the discriminatory performance of prognostic scores relying on age and/or comorbidities to predict ICU admission.


Asunto(s)
COVID-19 , Puntuaciones en la Disfunción de Órganos , Adulto , COVID-19/diagnóstico , COVID-19/terapia , Estudios de Cohortes , Humanos , Pacientes Internos , Pronóstico , Curva ROC , Estudios Retrospectivos , SARS-CoV-2
12.
Stud Health Technol Inform ; 295: 132-135, 2022 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-35773825

RESUMEN

Hospital caregivers report patient data while being under constant pressure. These records include structured information, with some of them being derived from a restricted list of terms. Finding the right term from a large terminology can be time-consuming, harming the clinician's productivity. To deal with this hurdle, an autocomplete system is employed, providing the closest terms after a prefix is typed. While this software application clearly smoothens the term searching, this paper studies the influences of the tool on caregivers' reporting, inspecting the evolution of their typing conduct over time.


Asunto(s)
Cuidadores , Programas Informáticos , Hospitales , Humanos , Estudios Retrospectivos
13.
Stud Health Technol Inform ; 294: 43-47, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612013

RESUMEN

Automatic classification of ECG signals has been a longtime research area with large progress having been made recently. However these advances have been achieved with increasingly complex models at the expense of model's interpretability. In this research, a new model based on multivariate autoregressive model (MAR) coefficients combined with a tree-based model to classify bundle branch blocks is proposed. The advantage of the presented approach is to build a lightweight model which combined with post-hoc interpretability can bring new insights into important cross-lead dependencies which are indicative of the diseases of interest.


Asunto(s)
Bloqueo de Rama , Electrocardiografía , Algoritmos , Bloqueo de Rama/diagnóstico , Humanos
14.
Stud Health Technol Inform ; 294: 317-321, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612084

RESUMEN

In spring 2020, as the COVID-19 pandemic is in its first wave in Europe, the University hospitals of Geneva (HUG) is tasked to take care of all Covid inpatients of the Geneva canton. It is a crisis with very little tools to support decision-taking authorities, and very little is known about the Covid disease. The need to know more, and fast, highlighted numerous challenges in the whole data pipeline processes. This paper describes the decisions taken and processes developed to build a unified database to support several secondary usages of clinical data, including governance and research. HUG had to answer to 5 major waves of COVID-19 patients since the beginning of 2020. In this context, a database for COVID-19 related data has been created to support the governance of the hospital in their answer to this crisis. The principles about this database were a) a clearly defined cohort; b) a clearly defined dataset and c) a clearly defined semantics. This approach resulted in more than 28 000 variables encoded in SNOMED CT and 1 540 human readable labels. It covers more than 216 000 patients and 590 000 inpatient stays. This database is used daily since the beginning of the pandemic to feed the "Predict" dashboards of HUG and prediction reports as well as several research projects.


Asunto(s)
COVID-19 , Systematized Nomenclature of Medicine , Bases de Datos Factuales , Humanos , Pandemias , Semántica
15.
Stud Health Technol Inform ; 294: 874-875, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612232

RESUMEN

Many medical narratives are read by care professionals in their preferred language. These documents can be produced by organizations, authorities or national publishers. However, they are often hardly findable using the usual query engines based on English such as PubMed. This work explores the possibility to automatically categorize medical documents in French following an automatic Natural Language Processing pipeline. The pipeline is used to compare the performance of 6 different machine learning and deep neural network approaches on a large dataset of peer-reviewed weekly published Swiss medical journal in French covering major topics in medicine over the last 15 years. An accuracy of 96% was achieved for 5-topic classification and 81% for 20-topic classification.


Asunto(s)
Aprendizaje Automático , Procesamiento de Lenguaje Natural , Lenguaje , Redes Neurales de la Computación , PubMed
16.
Artículo en Inglés | MEDLINE | ID: mdl-35206230

RESUMEN

The current availability of electronic health records represents an excellent research opportunity on multimorbidity, one of the most relevant public health problems nowadays. However, it also poses a methodological challenge due to the current lack of tools to access, harmonize and reuse research datasets. In FAIR4Health, a European Horizon 2020 project, a workflow to implement the FAIR (findability, accessibility, interoperability and reusability) principles on health datasets was developed, as well as two tools aimed at facilitating the transformation of raw datasets into FAIR ones and the preservation of data privacy. As part of this project, we conducted a multicentric retrospective observational study to apply the aforementioned FAIR implementation workflow and tools to five European health datasets for research on multimorbidity. We applied a federated frequent pattern growth association algorithm to identify the most frequent combinations of chronic diseases and their association with mortality risk. We identified several multimorbidity patterns clinically plausible and consistent with the bibliography, some of which were strongly associated with mortality. Our results show the usefulness of the solution developed in FAIR4Health to overcome the difficulties in data management and highlight the importance of implementing a FAIR data policy to accelerate responsible health research.


Asunto(s)
Manejo de Datos , Multimorbilidad , Algoritmos , Registros Electrónicos de Salud , Privacidad
17.
Open Res Eur ; 2: 34, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-37645268

RESUMEN

Due to the nature of health data, its sharing and reuse for research are limited by ethical, legal and technical barriers. The FAIR4Health project facilitated and promoted the application of FAIR principles in health research data, derived from the publicly funded health research initiatives to make them Findable, Accessible, Interoperable, and Reusable (FAIR). To confirm the feasibility of the FAIR4Health solution, we performed two pathfinder case studies to carry out federated machine learning algorithms on FAIRified datasets from five health research organizations. The case studies demonstrated the potential impact of the developed FAIR4Health solution on health outcomes and social care research. Finally, we promoted the FAIRified data to share and reuse in the European Union Health Research community, defining an effective EU-wide strategy for the use of FAIR principles in health research and preparing the ground for a roadmap for health research institutions. This scientific report presents a general overview of the FAIR4Health solution: from the FAIRification workflow design to translate raw data/metadata to FAIR data/metadata in the health research domain to the FAIR4Health demonstrators' performance.

18.
JMIR Med Inform ; 9(10): e29174, 2021 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-34643542

RESUMEN

BACKGROUND: Since the creation of the problem-oriented medical record, the building of problem lists has been the focus of many studies. To date, this issue is not well resolved, and building an appropriate contextualized problem list is still a challenge. OBJECTIVE: This paper aims to present the process of building a shared multipurpose common problem list at the Geneva University Hospitals. This list aims to bridge the gap between clinicians' language expressed in free text and secondary uses requiring structured information. METHODS: We focused on the needs of clinicians by building a list of uniquely identified expressions to support their daily activities. In the second stage, these expressions were connected to additional information to build a complex graph of information. A list of 45,946 expressions manually extracted from clinical documents was manually curated and encoded in multiple semantic dimensions, such as International Classification of Diseases, 10th revision; International Classification of Primary Care 2nd edition; Systematized Nomenclature of Medicine Clinical Terms; or dimensions dictated by specific usages, such as identifying expressions specific to a domain, a gender, or an intervention. The list was progressively deployed for clinicians with an iterative process of quality control, maintenance, and improvements, including the addition of new expressions or dimensions for specific needs. The problem management of the electronic health record allowed the measurement and correction of encoding based on real-world use. RESULTS: The list was deployed in production in January 2017 and was regularly updated and deployed in new divisions of the hospital. Over 4 years, 684,102 problems were created using the list. The proportion of free-text entries decreased progressively from 37.47% (8321/22,206) in December 2017 to 18.38% (4547/24,738) in December 2020. In the last version of the list, over 14 dimensions were mapped to expressions, among which 5 were international classifications and 8 were other classifications for specific uses. The list became a central axis in the electronic health record, being used for many different purposes linked to care, such as surgical planning or emergency wards, or in research, for various predictions using machine learning techniques. CONCLUSIONS: This study breaks with common approaches primarily by focusing on real clinicians' language when expressing patients' problems and secondarily by mapping whatever is required, including controlled vocabularies to answer specific needs. This approach improves the quality of the expression of patients' problems while allowing the building of as many structured dimensions as needed to convey semantics according to specific contexts. The method is shown to be scalable, sustainable, and efficient at hiding the complexity of semantics or the burden of constraint-structured problem list entry for clinicians. Ongoing work is analyzing the impact of this approach on how clinicians express patients' problems.

19.
JMIR Med Inform ; 9(6): e27591, 2021 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-34185008

RESUMEN

BACKGROUND: Interoperability is a well-known challenge in medical informatics. Current trends in interoperability have moved from a data model technocentric approach to sustainable semantics, formal descriptive languages, and processes. Despite many initiatives and investments for decades, the interoperability challenge remains crucial. The need for data sharing for most purposes ranging from patient care to secondary uses, such as public health, research, and quality assessment, faces unmet problems. OBJECTIVE: This work was performed in the context of a large Swiss Federal initiative aiming at building a national infrastructure for reusing consented data acquired in the health care and research system to enable research in the field of personalized medicine in Switzerland. The initiative is the Swiss Personalized Health Network (SPHN). This initiative is providing funding to foster use and exchange of health-related data for research. As part of the initiative, a national strategy to enable a semantically interoperable clinical data landscape was developed and implemented. METHODS: A deep analysis of various approaches to address interoperability was performed at the start, including large frameworks in health care, such as Health Level Seven (HL7) and Integrating Healthcare Enterprise (IHE), and in several domains, such as regulatory agencies (eg, Clinical Data Interchange Standards Consortium [CDISC]) and research communities (eg, Observational Medical Outcome Partnership [OMOP]), to identify bottlenecks and assess sustainability. Based on this research, a strategy composed of three pillars was designed. It has strong multidimensional semantics, descriptive formal language for exchanges, and as many data models as needed to comply with the needs of various communities. RESULTS: This strategy has been implemented stepwise in Switzerland since the middle of 2019 and has been adopted by all university hospitals and high research organizations. The initiative is coordinated by a central organization, the SPHN Data Coordination Center of the SIB Swiss Institute of Bioinformatics. The semantics is mapped by domain experts on various existing standards, such as Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), Logical Observation Identifiers Names and Codes (LOINC), and International Classification of Diseases (ICD). The resource description framework (RDF) is used for storing and transporting data, and to integrate information from different sources and standards. Data transformers based on SPARQL query language are implemented to convert RDF representations to the numerous data models required by the research community or bridge with other systems, such as electronic case report forms. CONCLUSIONS: The SPHN strategy successfully implemented existing standards in a pragmatic and applicable way. It did not try to build any new standards but used existing ones in a nondogmatic way. It has now been funded for another 4 years, bringing the Swiss landscape into a new dimension to support research in the field of personalized medicine and large interoperable clinical data.

20.
J Med Internet Res ; 23(1): e24594, 2021 01 26.
Artículo en Inglés | MEDLINE | ID: mdl-33496673

RESUMEN

BACKGROUND: Interoperability and secondary use of data is a challenge in health care. Specifically, the reuse of clinical free text remains an unresolved problem. The Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) has become the universal language of health care and presents characteristics of a natural language. Its use to represent clinical free text could constitute a solution to improve interoperability. OBJECTIVE: Although the use of SNOMED and SNOMED CT has already been reviewed, its specific use in processing and representing unstructured data such as clinical free text has not. This review aims to better understand SNOMED CT's use for representing free text in medicine. METHODS: A scoping review was performed on the topic by searching MEDLINE, Embase, and Web of Science for publications featuring free-text processing and SNOMED CT. A recursive reference review was conducted to broaden the scope of research. The review covered the type of processed data, the targeted language, the goal of the terminology binding, the method used and, when appropriate, the specific software used. RESULTS: In total, 76 publications were selected for an extensive study. The language targeted by publications was 91% (n=69) English. The most frequent types of documents for which the terminology was used are complementary exam reports (n=18, 24%) and narrative notes (n=16, 21%). Mapping to SNOMED CT was the final goal of the research in 21% (n=16) of publications and a part of the final goal in 33% (n=25). The main objectives of mapping are information extraction (n=44, 39%), feature in a classification task (n=26, 23%), and data normalization (n=23, 20%). The method used was rule-based in 70% (n=53) of publications, hybrid in 11% (n=8), and machine learning in 5% (n=4). In total, 12 different software packages were used to map text to SNOMED CT concepts, the most frequent being Medtex, Mayo Clinic Vocabulary Server, and Medical Text Extraction Reasoning and Mapping System. Full terminology was used in 64% (n=49) of publications, whereas only a subset was used in 30% (n=23) of publications. Postcoordination was proposed in 17% (n=13) of publications, and only 5% (n=4) of publications specifically mentioned the use of the compositional grammar. CONCLUSIONS: SNOMED CT has been largely used to represent free-text data, most frequently with rule-based approaches, in English. However, currently, there is no easy solution for mapping free text to this terminology and to perform automatic postcoordination. Most solutions conceive SNOMED CT as a simple terminology rather than as a compositional bag of ontologies. Since 2012, the number of publications on this subject per year has decreased. However, the need for formal semantic representation of free text in health care is high, and automatic encoding into a compositional ontology could be a solution.


Asunto(s)
Procesamiento de Lenguaje Natural , Systematized Nomenclature of Medicine , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA