RESUMEN
BackgroundFew case reports on human infections with the beef tapeworm Taenia saginata and the pork tapeworm, Taenia solium, diagnosed in Belgium have been published, yet the grey literature suggests a higher number of cases.AimTo identify and describe cases of taeniasis and cysticercosis diagnosed at two Belgian referral medical institutions from 1990 to 2015.MethodsIn this observational study we retrospectively gathered data on taeniasis and cysticercosis cases by screening laboratory, medical record databases as well a uniform hospital discharge dataset.ResultsA total of 221 confirmed taeniasis cases were identified. All cases for whom the causative species could be determined (170/221, 76.9%) were found to be T. saginata infections. Of those with available information, 40.0% were asymptomatic (26/65), 15.4% reported diarrhoea (10/65), 9.2% reported anal discomfort (6/65) and 15.7% acquired the infection in Belgium (11/70). Five definitive and six probable cases of neurocysticercosis (NCC), and two cases of non-central nervous system cysticercosis (non-CNS CC) were identified. Common symptoms and signs in five of the definitive and probable NCC cases were epilepsy, headaches and/or other neurological disorders. Travel information was available for 10 of the 13 NCC and non-CNS CC cases; two were Belgians travelling to and eight were immigrants or visitors travelling from endemic areas.ConclusionsThe current study indicates that a non-negligible number of taeniasis cases visit Belgian medical facilities, and that cysticercosis is occasionally diagnosed in international travellers.
Asunto(s)
Cisticercosis/diagnóstico , Taenia saginata/aislamiento & purificación , Taenia solium/aislamiento & purificación , Teniasis/diagnóstico , Adolescente , Adulto , Animales , Bélgica/epidemiología , Niño , Preescolar , Cisticercosis/epidemiología , Ensayo de Inmunoadsorción Enzimática , Heces , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Enfermedades Desatendidas/epidemiología , Estudios Retrospectivos , Teniasis/epidemiología , Centros de Atención TerciariaRESUMEN
Clinical codes are used for public reporting purposes, are fundamental to determining public financing for hospitals, and form the basis for reimbursement claims to insurance providers. They are assigned to a patient stay to reflect the diagnosis and performed procedures during that stay. This paper aims to enrich algorithms for automated clinical coding by taking a data-driven approach and by using unsupervised and semi-supervised techniques for the extraction of multi-word expressions that convey a generalisable medical meaning (referred to as concepts). Several methods for extracting concepts from text are compared, two of which are constructed from a large unannotated corpus of clinical free text. A distributional semantic model (i.c. the word2vec skip-gram model) is used to generalize over concepts and retrieve relations between them. These methods are validated on three sets of patient stay data, in the disease areas of urology, cardiology, and gastroenterology. The datasets are in Dutch, which introduces a limitation on available concept definitions from expert-based ontologies (e.g. UMLS). The results show that when expert-based knowledge in ontologies is unavailable, concepts derived from raw clinical texts are a reliable alternative. Both concepts derived from raw clinical texts perform and concepts derived from expert-created dictionaries outperform a bag-of-words approach in clinical code assignment. Adding features based on tokens that appear in a semantically similar context has a positive influence for predicting diagnostic codes. Furthermore, the experiments indicate that a distributional semantics model can find relations between semantically related concepts in texts but also introduces erroneous and redundant relations, which can undermine clinical coding performance.
Asunto(s)
Codificación Clínica , Bases del Conocimiento , Procesamiento de Lenguaje Natural , Semántica , Algoritmos , Humanos , Lenguaje , Países BajosRESUMEN
A multitude of information sources is present in the electronic health record (EHR), each of which can contain clues to automatically assign diagnosis and procedure codes. These sources however show information overlap and quality differences, which complicates the retrieval of these clues. Through feature selection, a denser representation with a consistent quality and less information overlap can be obtained. We introduce and compare coverage-based feature selection methods, based on confidence and information gain. These approaches were evaluated over a range of medical specialties, with seven different medical specialties for ICD-9-CM code prediction (six at the Antwerp University Hospital and one in the MIMIC-III dataset) and two different medical specialties for ICD-10-CM code prediction. Using confidence coverage to integrate all sources in an EHR shows a consistent improvement in F-measure (49.83% for diagnosis codes on average), both compared with the baseline (44.25% for diagnosis codes on average) and with using the best standalone source (44.41% for diagnosis codes on average). Confidence coverage creates a concise patient stay representation independent of a rigid framework such as UMLS, and contains easily interpretable features. Confidence coverage has several advantages to a baseline setup. In our baseline setup, feature selection was limited to a filter removing features with less than five total occurrences in the trainingset. Prediction results improved consistently when using multiple heterogeneous sources to predict clinical codes, while reducing the number of features and the processing time.
Asunto(s)
Registros Electrónicos de Salud , Clasificación Internacional de Enfermedades , Algoritmos , HumanosRESUMEN
OBJECTIVE: Enormous amounts of healthcare data are becoming increasingly accessible through the large-scale adoption of electronic health records. In this work, structured and unstructured (textual) data are combined to assign clinical diagnostic and procedural codes (specifically ICD-9-CM) to patient stays. We investigate whether integrating these heterogeneous data types improves prediction strength compared to using the data types in isolation. METHODS: Two separate data integration approaches were evaluated. Early data integration combines features of several sources within a single model, and late data integration learns a separate model per data source and combines these predictions with a meta-learner. This is evaluated on data sources and clinical codes from a broad set of medical specialties. RESULTS: When compared with the best individual prediction source, late data integration leads to improvements in predictive power (eg, overall F-measure increased from 30.6% to 38.3% for International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnostic codes), while early data integration is less consistent. The predictive strength strongly differs between medical specialties, both for ICD-9-CM diagnostic and procedural codes. DISCUSSION: Structured data provides complementary information to unstructured data (and vice versa) for predicting ICD-9-CM codes. This can be captured most effectively by the proposed late data integration approach. CONCLUSIONS: We demonstrated that models using multiple electronic health record data sources systematically outperform models using data sources in isolation in the task of predicting ICD-9-CM codes over a broad range of medical specialties.