RESUMEN
BACKGROUND: With the widespread adoption of Blood Establishment Computer Systems and other Blood Collection and Transfusion Service (BCTS) clinical information systems (CIS), electronic blood donor, product, and patient data are now routinely required for clinical, regulatory, operational, and quality needs. That data are often not readily accessible for such secondary use within CIS databases, particularly for applications with significant data availability requirements such as machine learning and artificial intelligence. Data replication provides one avenue by which CIS data can be made more readily available. STUDY DESIGN AND METHODS: Members of the AABB's Information Systems Committee along with institutional information technology colleagues provided a multi-institutional viewpoint on data replication through the lens of BCTS specific use cases. Case studies of informatics offerings leveraging such technologies were also elicited. RESULTS: Six distinct use cases describe the potential role of data replication including the creation of data warehouses for frontline laboratory staff. Specific BCTS examples for each use case are presented to highlight the value of data replication, including visualization of critical inventory (O red blood cells, HLA-compatible platelets) and utilization analytics for patient blood management. Two case studies describe the approach to implement such technologies to (1) optimize staffing via laboratory workload reporting and (2) improve access to blood via antigen-negative blood product location services. DISCUSSION: Data replication and warehousing can empower BCTS analytic offerings not otherwise natively available through one's CIS to improve patient care and laboratory operations.
Asunto(s)
Transfusión Sanguínea , Humanos , Transfusión Sanguínea/métodos , Data Warehousing , Bancos de SangreRESUMEN
INTRODUCTION: Clinical publications use mortality as a hard end point. It is unknown how many patient deaths are under-reported in institutional databases. The objective of this study was to query mortality in our patient cohort from our data warehouse and compare these deaths to those identified in different databases. METHODS: We passed the first/last name and date of birth of 134 patients through online mortality search engines (Find a Grave Index, US Cemetery and Funeral Home Collection, etc.) to assess their ability to capture patient deaths and compared that to deaths recorded from our institutional data warehouse. RESULTS: Our institutional data warehouse found approximately one-third of the total patient mortalities. After the Social Security Death Index, we found that the Find a Grave Index captured the most mortalities missed by the institutional data warehouse. These results highlight the advantages of incorporating readily available search engines into institutional data warehouses for the accurate collection of patient mortalities, particularly those that occur outside of index operative admission. CONCLUSIONS: The incorporation of the mortality search engines significantly augmented the capture of patient deaths. Our approach may be useful for tailored patient outreach and reporting mortalities with institutional data.
Asunto(s)
Data Warehousing , Motor de Búsqueda , Humanos , Bases de Datos FactualesRESUMEN
BACKGROUND: Clinical data warehouses provide access to massive amounts of medical images, but these images are often heterogeneous. They can for instance include images acquired both with or without the injection of a gadolinium-based contrast agent. Harmonizing such data sets is thus fundamental to guarantee unbiased results, for example when performing differential diagnosis. Furthermore, classical neuroimaging software tools for feature extraction are typically applied only to images without gadolinium. The objective of this work is to evaluate how image translation can be useful to exploit a highly heterogeneous data set containing both contrast-enhanced and non-contrast-enhanced images from a clinical data warehouse. METHODS: We propose and compare different 3D U-Net and conditional GAN models to convert contrast-enhanced T1-weighted (T1ce) into non-contrast-enhanced (T1nce) brain MRI. These models were trained using 230 image pairs and tested on 77 image pairs from the clinical data warehouse of the Greater Paris area. RESULTS: Validation using standard image similarity measures demonstrated that the similarity between real and synthetic T1nce images was higher than between real T1nce and T1ce images for all the models compared. The best performing models were further validated on a segmentation task. We showed that tissue volumes extracted from synthetic T1nce images were closer to those of real T1nce images than volumes extracted from T1ce images. CONCLUSION: We showed that deep learning models initially developed with research quality data could synthesize T1nce from T1ce images of clinical quality and that reliable features could be extracted from the synthetic images, thus demonstrating the ability of such methods to help exploit a data set coming from a clinical data warehouse.
Asunto(s)
Data Warehousing , Gadolinio , Humanos , Encéfalo/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Neuroimagen/métodos , Procesamiento de Imagen Asistido por Computador/métodosRESUMEN
BACKGROUND: Although significant progress has been made in improving the rate of survival for pediatric optic pathway gliomas (OPGs), data describing the methods of diagnosis and treatment for OPGs are limited in the modern era. This retrospective study aims to provide an epidemiological overview in the pediatric population and an update on eye care resource utilization in OPG patients using big data analysis. METHODS: Using the OptumLabs Data Warehouse, 9-11 million children from 2016 to 2021 assessed the presence of an OPG claim. This data set was analyzed for demographic distribution data and clinical data including average ages for computed tomography (CT), MRI, strabismus, and related treatment (surgery, chemotherapy, and radiation), as well as yearly rates for optical coherence tomography (OCT) and visual field (VF) examinations. RESULTS: Five hundred fifty-one unique patients ranging in age from 0 to 17 years had an OPG claim, with an estimated prevalence of 4.6-6.1 per 100k. Among the 476 OPG patients with at least 6 months of follow-up, 88.9% had at least one MRI and 15.3% had at least one CT. Annual rates for OCT and VF testing were similar (1.26 vs 1.35 per year), although OCT was ordered for younger patients (mean age = 9.2 vs 11.7 years, respectively). During the study period, 14.1% of OPG patients had chemotherapy, 6.1% had either surgery or radiation, and 81.7% had no treatment. CONCLUSIONS: This study updates OPG demographics for the modern era and characterizes the burden of the treatment course for pediatric OPG patients using big data analysis of a commercial claims database. OPGs had a prevalence of about 0.005% occurring equally in boys and girls. Most did not receive treatment, and the average child had at least one claim for OCT or VF per year for clinical monitoring. This study is limited to only commercially insured children, who represent approximately half of the general child population.
Asunto(s)
Neurofibromatosis 1 , Glioma del Nervio Óptico , Masculino , Femenino , Niño , Humanos , Recién Nacido , Lactante , Preescolar , Adolescente , Estudios Retrospectivos , Prevalencia , Data Warehousing , Glioma del Nervio Óptico/diagnóstico , Glioma del Nervio Óptico/epidemiología , Glioma del Nervio Óptico/terapia , Campos Visuales , Neurofibromatosis 1/diagnósticoRESUMEN
BACKGROUND: Total hip, knee and shoulder arthroplasties (THKSA) are increasing due to expanding demands in ageing population. Material surveillance is important to prevent severe complications involving implantable medical devices (IMD) by taking appropriate preventive measures. Automating the analysis of patient and IMD features could benefit physicians and public health policies, allowing early issue detection and decision support. The study aimed to demonstrate the feasibility of automated cohorting of patients with a first arthroplasty in two hospital data warehouses (HDW) in France. METHODS: The study included adult patients with an arthroplasty between 2010 and 2019 identified by 2 data sources: hospital discharge and pharmacy. Selection was based on the health insurance thesaurus of IMDs in the pharmacy database: 1,523 distinct IMD references for primary THSKA. In the hospital discharge database, 22 distinct procedures for native joint replacement allowing a matching between IMD and surgical procedure of each patient selected. A program to automate information extraction was implemented in the 1st hospital data warehouse using natural language processing (NLP) on pharmacy labels, then it was then applied to the 2nd hospital. RESULTS: The e-cohort was built with a first arthroplasty for THKSA performed in 7,587 patients with a mean age of 67.4 years, and a sex ratio of 0.75. The cohort involved 4,113 hip, 2,630 knee and 844 shoulder surgical patients. Obesity, cardio-vascular diseases and hypertension were the most frequent medical conditions. DISCUSSION: The implementation of an e-cohort for material surveillance will be easily workable over HDWs France wild. Using NLP as no international IMD mapping exists to study IMD, our approach aims to close the gap between conventional epidemiological cohorting tools and bigdata approach. CONCLUSION: This pilot study demonstrated the feasibility of an e-cohort of orthopaedic devices using clinical data warehouses. The IMD and patient features could be studied with intra-hospital follow-up and will help analysing the infectious and unsealing complications.
Asunto(s)
Estudios de Factibilidad , Humanos , Anciano , Masculino , Femenino , Persona de Mediana Edad , Francia , Data Warehousing , Prótesis e Implantes , Anciano de 80 o más Años , Artroplastia de Reemplazo , Procesamiento de Lenguaje NaturalRESUMEN
Monoclonal antibodies (MAs) are increasingly used in the therapeutic arsenal. Clinical Data Warehouses (CDWs) offer unprecedented opportunities for research on real-word data. The objective of this work is to develop a knowledge organization system on MAs for therapeutic use (MATUs) applicable in Europe to query CDWs from a multi-terminology server (HeTOP). After expert consensus, three main health thesauri were selected: the MeSH thesaurus, the National Cancer Institute thesaurus (NCIt) and the SNOMED CT. These thesauri contain 1,723 MAs concepts, but only 99 (5.7 %) are identified as MATUs. The knowledge organisation system proposed in this article is a six-level hierarchical system according to their main therapeutic target. It includes 193 different concepts organised in a cross lingual terminology server, which will allow the inclusion of semantic extensions. Ninety nine (51.3 %) MATUs concepts and 94 (48.7 %) hierarchical concepts composed the knowledge organisation system. Two separates groups (an expert group and a validation group) carried out the selection, creation and validation processes. Queries identify, for unstructured data, 83 out of 99 (83.8 %) MATUs corresponding to 45,262 patients, 347,035 hospital stays and 427,544 health documents, and for structured data, 61 out of 99 (61.6 %) MATUs corresponding to 9,218 patients, 59,643 hospital stays and 104,737 hospital prescriptions. The volume of data in the CDW demonstrated the potential for using these data in clinical research, although not all MATUs are present in the CDW (16 missing for unstructured data and 38 for structured data). The knowledge organisation system proposed here improves the understanding of MATUs, the quality of queries and helps clinical researchers retrieve relevant medical information. The use of this model in CDW allows for the rapid identification of a large number of patients and health documents, either directly by a MATU of interest (e.g. Rituximab) but also by searching for parent concepts (e.g. Anti-CD20 Monoclonal Antibody).
Asunto(s)
Anticuerpos Monoclonales , Vocabulario Controlado , Humanos , Anticuerpos Monoclonales/uso terapéutico , Systematized Nomenclature of Medicine , Data Warehousing , Europa (Continente)RESUMEN
BACKGROUND: The use of real-world data (RWD) warehouses for research in Asia is on the rise, but current trends remain largely unexplored. Given the varied economic and health care landscapes in different Asian countries, understanding these trends can offer valuable insights. OBJECTIVE: We sought to discern the contemporary landscape of linked RWD warehouses and explore their trends and patterns in 3 Asian countries with contrasting economies and health care systems: Taiwan, India, and Thailand. METHODS: Using a systematic scoping review methodology, we conducted an exhaustive literature search on PubMed with filters for the English language and the past 5 years. The search combined Medical Subject Heading terms and specific keywords. Studies were screened against strict eligibility criteria to identify eligible studies using RWD databases from more than one health care facility in at least 1 of the 3 target countries. RESULTS: Our search yielded 2277 studies, of which 833 (36.6%) met our criteria. Overall, single-country studies (SCS) dominated at 89.4% (n=745), with cross-country collaboration studies (CCCS) being at 10.6% (n=88). However, the country-wise breakdown showed that of all the SCS, 623 (83.6%) were from Taiwan, 81 (10.9%) from India, and 41 (5.5%) from Thailand. Among the total studies conducted in each country, India at 39.1% (n=133) and Thailand at 43.1% (n=72) had a significantly higher percentage of CCCS compared to Taiwan at 7.6% (n=51). Over a 5-year span from 2017 to 2022, India and Thailand experienced an annual increase in RWD studies by approximately 18.2% and 13.8%, respectively, while Taiwan's contributions remained consistent. Comparative effectiveness research (CER) was predominant in Taiwan (n=410, or 65.8% of SCS) but less common in India (n=12, or 14.8% of SCS) and Thailand (n=11, or 26.8% of SCS). CER percentages in CCCS were similar across the 3 countries, ranging from 19.2% (n=10) to 29% (n=9). The type of RWD source also varied significantly across countries, with India demonstrating a high reliance on electronic medical records or electronic health records at 55.6% (n=45) of SCS and Taiwan showing an increasing trend in their use over the period. Registries were used in 26 (83.9%) CCCS and 31 (75.6%) SCS from Thailand but in <50% of SCS from Taiwan and India. Health insurance/administrative claims data were used in most of the SCS from Taiwan (n=458, 73.5%). There was a consistent predominant focus on cardiology/metabolic disorders in all studies, with a noticeable increase in oncology and infectious disease research from 2017 to 2022. CONCLUSIONS: This review provides a comprehensive understanding of the evolving landscape of RWD research in Taiwan, India, and Thailand. The observed differences and trends emphasize the unique economic, clinical, and research settings in each country, advocating for tailored strategies for leveraging RWD for future health care research and decision-making. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/43741.
Asunto(s)
Investigación Biomédica , Data Warehousing , Bases de Datos Factuales , Humanos , Asiático , India , Taiwán , TailandiaRESUMEN
BACKGROUND: Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information. METHODS: This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model. RESULTS: The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers. CONCLUSIONS: This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.
Asunto(s)
Neoplasias Colorrectales , Aprendizaje Profundo , Humanos , Colonoscopía , Procesamiento de Lenguaje Natural , Data WarehousingRESUMEN
BACKGROUND: Aggregate electronic data repositories and population-level cross-sectional surveys play a critical role in HIV programme monitoring and surveillance for data-driven decision-making. However, these data sources have inherent limitations including inability to respond to public health priorities in real-time and to longitudinally follow up clients for ascertainment of long-term outcomes. Electronic medical records (EMRs) have tremendous potential to bridge these gaps when harnessed into a centralised data repository. We describe the evolution of EMRs and the development of a centralised national data warehouse (NDW) repository. Further, we describe the distribution and representativeness of data from the NDW and explore its potential for population-level surveillance of HIV testing, care and treatment in Kenya. MAIN BODY: Health information systems in Kenya have evolved from simple paper records to web-based EMRs with features that support data transmission to the NDW. The NDW design includes four layers: data warehouse application programming interface (DWAPI), central staging, integration service, and data visualization application. The number of health facilities uploading individual-level data to the NDW increased from 666 in 2016 to 1,516 in 2020, covering 41 of 47 counties in Kenya. By the end of 2020, the NDW hosted longitudinal data from 1,928,458 individuals ever started on antiretroviral therapy (ART). In 2020, there were 936,869 individuals who were active on ART in the NDW, compared to 1,219,276 individuals on ART reported in the aggregate-level Kenya Health Information System (KHIS), suggesting 77% coverage. The proportional distribution of individuals on ART by counties in the NDW was consistent with that from KHIS, suggesting representativeness and generalizability at the population level. CONCLUSION: The NDW presents opportunities for individual-level HIV programme monitoring and surveillance because of its longitudinal design and its ability to respond to public health priorities in real-time. A comparison with estimates from KHIS demonstrates that the NDW has high coverage and that the data maybe representative and generalizable at the population-level. The NDW is therefore a unique and complementary resource for HIV programme monitoring and surveillance with potential to strengthen timely data driven decision-making towards HIV epidemic control in Kenya. DATABASE LINK: ( https://dwh.nascop.org/ ).
Asunto(s)
Data Warehousing , Registros Electrónicos de Salud , Humanos , Estudios Transversales , Kenia/epidemiología , Prueba de VIHRESUMEN
This paper describes the development and implementation of an anesthesia data warehouse in the Lille University Hospital. We share the lessons learned from a ten-year project and provide guidance for the implementation of such a project. Our clinical data warehouse is mainly fed with data collected by the anesthesia information management system and hospital discharge reports. The data warehouse stores historical and accurate data with an accuracy level of the day for administrative data, and of the second for monitoring data. Datamarts complete the architecture and provide secondary computed data and indicators, in order to execute queries faster and easily. Between 2010 and 2021, 636 784 anesthesia records were integrated for 353 152 patients. We reported the main concerns and barriers during the development of this project and we provided 8 tips to handle them. We have implemented our data warehouse into the OMOP common data model as a complementary downstream data model. The next step of the project will be to disseminate the use of the OMOP data model for anesthesia and critical care, and drive the trend towards federated learning to enhance collaborations and multicenter studies.
Asunto(s)
Anestesia , Data Warehousing , HumanosRESUMEN
This study aims to show the feasibility and benefit of single queries in a research data warehouse combining data from a hospital's clinical and imaging systems. We used a comprehensive integration of a production picture archiving and communication system (PACS) with a clinical data warehouse (CDW) for research to create a system that allows data from both domains to be queried jointly with a single query. To achieve this, we mapped the DICOM information model to the extended entity-attribute-value (EAV) data model of a CDW, which allows data linkage and query constraints on multiple levels: the patient, the encounter, a document, and a group level. Accordingly, we have integrated DICOM metadata directly into CDW and linked it to existing clinical data. We included data collected in 2016 and 2017 from the Department of Internal Medicine in this analysis for two query inquiries from researchers targeting research about a disease and in radiology. We obtained quantitative information about the current availability of combinations of clinical and imaging data using a single multilevel query compiled for each query inquiry. We compared these multilevel query results to results that linked data at a single level, resulting in a quantitative representation of results that was up to 112% and 573% higher. An EAV data model can be extended to store data from clinical systems and PACS on multiple levels to enable combined querying with a single query to quickly display actual frequency data.
Asunto(s)
Sistemas de Información Radiológica , Radiología , Humanos , Data Warehousing , Almacenamiento y Recuperación de la Información , Diagnóstico por ImagenRESUMEN
Importance: Given the rapid increase in telehealth utilization since the onset of the COVID-19 pandemic, it has become essential to examining the vast amount of available data on telehealth encounters to conduct more cogent, robust, and large-scope research studies to examine the utility, cost-impact, and effect on clinical outcomes that telehealth can potentially provide. However, the diversity of data collected by numerous telehealth organizations has made that type of analysis difficult. Objective: The University of Mississippi Medical Center (UMMC), a Telehealth Center of Excellence designated by the Health Resources and Services Administration, is creating a National Telehealth Data Warehouse. Design: UMMC will develop the data warehouse in Microsoft Azure and will use a data dictionary that was created by the Center for Telehealth and eHealth Law (CTeL) to support their national cost-benefit study on the use of telehealth during COVID-19. Impact: The data warehouse will provide unparalleled opportunities to conduct cost-benefit and cost-effectiveness analyses on telehealth, to develop and test quality measures specific to telehealth, and to understand how telehealth and reduce disparities in health care and expand access to care for everyone. The warehouse is expected to go live in the Summer of 2023.
Asunto(s)
COVID-19 , Telemedicina , Humanos , COVID-19/epidemiología , Pandemias , Data Warehousing , HospitalesRESUMEN
The use of Clinical Data Warehouse (CDW) for research and quality improvement has become more frequent in the last 10 years. In this study, we used CDW to determine the effectiveness of pressure ulcer interventions offered by ward nurses and wound care nursing specialists. A retrospective clinical outcomes study that utilise CDW has been carried out. We identified 1415 patients who were evaluated as pressure ulcer risk group from 1 July 2019 to 31 December 2019. Kaplan-Meier survival analyses were used to estimate the time to occurrence of pressure ulcers. We compared the survival curves of each group by applying the log-rank test for significance. The overall median time to occurrence for both groups was 13 days (95% CI range: 11-14 days). The control group showed a longer median time (14 days) to occurrence than the case group (12 days). In the pressure ulcer stage I, the case group showed a longer median time (14 days) to occurrence than the control group (8 days), indicating that the intervention provided by the wound care nursing specialist was effective in stage I, and delayed the occurrence of pressure ulcers. The findings may be used as preliminary data for the utilisation of the CDW in the field of nursing research in the future. Also, facilitating the accessibility of the wound care nursing specialist in the general wards should be effective to decrease the incidence rates.
Asunto(s)
Úlcera por Presión , Humanos , Úlcera por Presión/epidemiología , Centros de Atención Terciaria , Estudios Retrospectivos , Data Warehousing , República de CoreaRESUMEN
BACKGROUND: Knowledge graphs (KGs) play a key role to enable explainable artificial intelligence (AI) applications in healthcare. Constructing clinical knowledge graphs (CKGs) against heterogeneous electronic health records (EHRs) has been desired by the research and healthcare AI communities. From the standardization perspective, community-based standards such as the Fast Healthcare Interoperability Resources (FHIR) and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) are increasingly used to represent and standardize EHR data for clinical data analytics, however, the potential of such a standard on building CKG has not been well investigated. OBJECTIVE: To develop and evaluate methods and tools that expose the OMOP CDM-based clinical data repositories into virtual clinical KGs that are compliant with FHIR Resource Description Framework (RDF) specification. METHODS: We developed a system called FHIR-Ontop-OMOP to generate virtual clinical KGs from the OMOP relational databases. We leveraged an OMOP CDM-based Medical Information Mart for Intensive Care (MIMIC-III) data repository to evaluate the FHIR-Ontop-OMOP system in terms of the faithfulness of data transformation and the conformance of the generated CKGs to the FHIR RDF specification. RESULTS: A beta version of the system has been released. A total of more than 100 data element mappings from 11 OMOP CDM clinical data, health system and vocabulary tables were implemented in the system, covering 11 FHIR resources. The generated virtual CKG from MIMIC-III contains 46,520 instances of FHIR Patient, 716,595 instances of Condition, 1,063,525 instances of Procedure, 24,934,751 instances of MedicationStatement, 365,181,104 instances of Observations, and 4,779,672 instances of CodeableConcept. Patient counts identified by five pairs of SQL (over the MIMIC database) and SPARQL (over the virtual CKG) queries were identical, ensuring the faithfulness of the data transformation. Generated CKG in RDF triples for 100 patients were fully conformant with the FHIR RDF specification. CONCLUSION: The FHIR-Ontop-OMOP system can expose OMOP database as a FHIR-compliant RDF graph. It provides a meaningful use case demonstrating the potentials that can be enabled by the interoperability between FHIR and OMOP CDM. Generated clinical KGs in FHIR RDF provide a semantic foundation to enable explainable AI applications in healthcare.
Asunto(s)
Inteligencia Artificial , Reconocimiento de Normas Patrones Automatizadas , Data Warehousing , Atención a la Salud , Registros Electrónicos de Salud , HumanosRESUMEN
The National Genomics Data Center (NGDC) provides a suite of database resources to support worldwide research activities in both academia and industry. With the rapid advancements in higher-throughput and lower-cost sequencing technologies and accordingly the huge volume of multi-omics data generated at exponential scales and rates, NGDC is continually expanding, updating and enriching its core database resources through big data integration and value-added curation. In the past year, efforts for update have been mainly devoted to BioProject, BioSample, GSA, GWH, GVM, NONCODE, LncBook, EWAS Atlas and IC4R. Newly released resources include three human genome databases (PGG.SNV, PGG.Han and CGVD), eLMSG, EWAS Data Hub, GWAS Atlas, iSheep and PADS Arsenal. In addition, four web services, namely, eGPS Cloud, BIG Search, BIG Submission and BIG SSO, have been significantly improved and enhanced. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.
Asunto(s)
Biología Computacional/métodos , Biología Computacional/organización & administración , Bases de Datos Genéticas , Genómica/métodos , Genómica/organización & administración , Navegador Web , Data Warehousing , Genoma Humano , Estudio de Asociación del Genoma Completo/métodos , HumanosRESUMEN
BACKGROUND: South Africa's National Health Laboratory Service (NHLS), the only clinical laboratory service in the country's public health sector, is an important resource for monitoring public health programmes. OBJECTIVES: We describe NHLS data quality, particularly patient demographics among infants, and the effect this has on linking multiple test results to a single patient. METHODS: Retrospective descriptive analysis of NHLS data from 1st January 2017-1st September 2020 was performed. A validated probabilistic record-linking algorithm linked multiple results to individual patients in lieu of a unique patient identifier. Paediatric HIV PCR data was used to illustrate the effect on monitoring and evaluating a public health programme. Descriptive statistics including medians, proportions and inter quartile ranges are reported, with Chi-square univariate tests for independence used to determine association between variables. RESULTS: During the period analysed, 485 300 007 tests, 98 217 642 encounters and 35 771 846 patients met criteria for analysis. Overall, 15.80% (n = 15 515 380) of all encounters had a registered national identity (ID) number, 2.11% (n = 2 069 785) were registered without a given name, 63.15% (n = 62 020 107) were registered to women and 32.89% (n = 32 304 329) of all folder numbers were listed as either the patient's date of birth or unknown. For infants tested at < 7 days of age (n = 2 565 329), 0.099% (n = 2 534) had an associated ID number and 48.87% (n = 1 253 620) were registered without a given name. Encounters with a given name were linked to a subsequent encounter 40.78% (n = 14 180 409 of 34 775 617) of the time, significantly more often than the 21.85% (n = 217 660 of 996 229) of encounters registered with a baby-derivative name (p-value < 0.001). CONCLUSION: Unavailability and poor capturing of patient demographics, especially among infants and children, affects the ability to accurately monitor routine health programmes. A unique national patient identifier, other than the national ID number, is urgently required and must be available at birth if South Africa is to accurately monitor programmes such as the Prevention of Mother-to-Child Transmission of HIV.
Asunto(s)
Infecciones por VIH , Transmisión Vertical de Enfermedad Infecciosa , Niño , Salud Infantil , Exactitud de los Datos , Data Warehousing , Femenino , Infecciones por VIH/diagnóstico , Infecciones por VIH/epidemiología , Infecciones por VIH/prevención & control , Humanos , Lactante , Recién Nacido , Transmisión Vertical de Enfermedad Infecciosa/prevención & control , Estudios Retrospectivos , Sudáfrica/epidemiologíaRESUMEN
BACKGROUND: Unstructured data from electronic health records represent a wealth of information. Doc'EDS is a pre-screening tool based on textual and semantic analysis. The Doc'EDS system provides a graphic user interface to search documents in French. The aim of this study was to present the Doc'EDS tool and to provide a formal evaluation of its semantic features. METHODS: Doc'EDS is a search tool built on top of the clinical data warehouse developed at Rouen University Hospital. This tool is a multilevel search engine combining structured and unstructured data. It also provides basic analytical features and semantic utilities. A formal evaluation was conducted to measure the impact of Natural Language Processing algorithms. RESULTS: Approximately 18.1 million narrative documents are stored in Doc'EDS. The formal evaluation was conducted in 5000 clinical concepts that were manually collected. The F-measures of negative concepts and hypothetical concepts were respectively 0.89 and 0.57. CONCLUSION: In this formal evaluation, we have shown that Doc'EDS is able to deal with language subtleties to enhance an advanced full text search in French health documents. The Doc'EDS tool is currently used on a daily basis to help researchers to identify patient cohorts thanks to unstructured data.
Asunto(s)
Data Warehousing , Semántica , Registros Electrónicos de Salud , Humanos , Procesamiento de Lenguaje Natural , Motor de BúsquedaRESUMEN
Nowadays, manufacturers are shifting from a traditional product-centric business paradigm to a service-centric one by offering products that are accompanied by services, which is known as Product-Service Systems (PSSs). PSS customization entails configuring products with varying degrees of differentiation to meet the needs of various customers. This is combined with service customization, in which configured products are expanded by customers to include smart IoT devices (e.g., sensors) to improve product usage and facilitate the transition to smart connected products. The concept of PSS customization is gaining significant interest; however, there are still numerous challenges that must be addressed when designing and offering customized PSSs, such as choosing the optimum types of sensors to install on products and their adequate locations during the service customization process. In this paper, we propose a data warehouse-based recommender system that collects and analyzes large volumes of product usage data from similar products to the product that the customer needs to customize by adding IoT smart devices. The analysis of these data helps in identifying the most critical parts with the highest number of incidents and the causes of those incidents. As a result, sensor types are determined and recommended to the customer based on the causes of these incidents. The utility and applicability of the proposed RS have been demonstrated through its application in a case study that considers the rotary spindle units of a CNC milling machine.
Asunto(s)
Comercio , Data WarehousingRESUMEN
Data derived from the electronic health record (EHR) is frequently extracted using undefined approaches that may affect the accuracy of collected variables. Further, efforts to assess data accuracy often suffer from limited collaboration between clinicians and data analysts who perform the extraction. In this manuscript, we describe the methodology behind creation of a structured, rigorously derived intensive care unit (ICU) data mart based on data automatically and routinely derived from the EHR. This ICU data mart includes high-quality data elements commonly used for quality improvement and research purposes. These data elements were identified by physicians working closely with data analysts to iteratively develop and refine algorithmic definitions for complex outcomes and risk factors. We contend that this methodology can be reproduced and applied across other institution or to other clinical domains to create high quality data marts, inclusive of complex outcomes data.
Asunto(s)
Data Warehousing , Mejoramiento de la Calidad , Exactitud de los Datos , Registros Electrónicos de Salud , Humanos , Unidades de Cuidados IntensivosRESUMEN
Background and Objectives: For preventing postoperative delirium (POD), identifying the risk factors is important. However, the relationship between blood transfusion and POD is still controversial. The aim of this study was to identify the risk factors of POD, to evaluate the impact of blood transfusion in developing POD among people undergoing spinal fusion surgery, and to show the effectiveness of big data analytics using a clinical data warehouse (CDW). Materials and Methods: The medical data of patients who underwent spinal fusion surgery were obtained from the CDW of the five hospitals of Hallym University Medical Center. Clinical features, laboratory findings, perioperative variables, and medication history were compared between patients without POD and with POD. Results: 234 of 3967 patients (5.9%) developed POD. In multivariate logistic regression analysis, the risk factors of POD were as follows: Parkinson's disease (OR 5.54, 95% CI 2.15-14.27; p < 0.001), intensive care unit (OR 3.45 95% CI 2.42-4.91; p < 0.001), anti-psychotics drug (OR 3.35 95% CI 1.91-5.89; p < 0.001), old age (≥70 years) (OR 3.08, 95% CI 2.14-4.43; p < 0.001), depression (OR 2.8 95% CI 1.27-6.2; p < 0.001). The intraoperative transfusion (OR 1.1, 95% CI 0.91-1.34; p = 0.582), and the postoperative transfusion (OR 0.91, 95% CI 0.74-1.12; p = 0.379) had no statistically significant effect on the incidence of POD. Conclusions: There was no relationship between perioperative blood transfusion and the incidence of POD in spinal fusion surgery. Big data analytics using a CDW could be helpful for the comprehensive understanding of the risk factors of POD, and for preventing POD in spinal fusion surgery.