RESUMEN
BACKGROUND: Asia consists of diverse nations with extremely variable health care systems. Integrated real-world data (RWD) research warehouses provide vast interconnected data sets that uphold statistical rigor. Yet, their intricate details remain underexplored, restricting their broader applications. OBJECTIVE: Building on our previous research that analyzed integrated RWD warehouses in India, Thailand, and Taiwan, this study extends the research to 7 distinct health care systems: Hong Kong, Indonesia, Malaysia, Pakistan, the Philippines, Singapore, and Vietnam. We aimed to map the evolving landscape of RWD, preferences for methodologies, and database use and archetype the health systems based on existing intrinsic capability for RWD generation. METHODS: A systematic scoping review methodology was used, centering on contemporary English literature on PubMed (search date: May 9, 2023). Rigorous screening as defined by eligibility criteria identified RWD studies from multiple health care facilities in at least 1 of the 7 target Asian nations. Point estimates and their associated errors were determined for the data collected from eligible studies. RESULTS: Of the 1483 real-world evidence citations identified on May 9, 2023, a total of 369 (24.9%) fulfilled the requirements for data extraction and subsequent analysis. Singapore, Hong Kong, and Malaysia contributed to ≥100 publications, with each country marked by a higher proportion of single-country studies at 51% (80/157), 66.2% (86/130), and 50% (50/100), respectively, and were classified as solo scholars. Indonesia, Pakistan, Vietnam, and the Philippines had fewer publications and a higher proportion of cross-country collaboration studies (CCCSs) at 79% (26/33), 58% (18/31), 74% (20/27), and 86% (19/22), respectively, and were classified as global collaborators. Collaboration with countries outside the 7 target nations appeared in 84.2% to 97.7% of the CCCSs of each nation. Among target nations, Singapore and Malaysia emerged as preferred research partners for other nations. From 2018 to 2023, most nations showed an increasing trend in study numbers, with Vietnam (24.5%) and Pakistan (21.2%) leading the growth; the only exception was the Philippines, which declined by -14.5%. Clinical registry databases were predominant across all CCCSs from every target nation. For single-country studies, Indonesia, Malaysia, and the Philippines favored clinical registries; Singapore had a balanced use of clinical registries and electronic medical or health records, whereas Hong Kong, Pakistan, and Vietnam leaned toward electronic medical or health records. Overall, 89.9% (310/345) of the studies took >2 years from completion to publication. CONCLUSIONS: The observed variations in contemporary RWD publications across the 7 nations in Asia exemplify distinct research landscapes across nations that are partially explained by their diverse economic, clinical, and research settings. Nevertheless, recognizing these variations is pivotal for fostering tailored, synergistic strategies that amplify RWD's potential in guiding future health care research and policy decisions. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/43741.
Asunto(s)
Atención a la Salud , Humanos , Atención a la Salud/estadística & datos numéricos , Asia , Vietnam , Filipinas , Indonesia , Malasia , Pakistán , Singapur , Bases de Datos FactualesRESUMEN
The transition to smart manufacturing introduces heightened complexity in regard to the machinery and equipment used within modern collaborative manufacturing landscapes, presenting significant risks associated with equipment failures. The core ambition of smart manufacturing is to elevate automation through the integration of state-of-the-art technologies, including artificial intelligence (AI), the Internet of Things (IoT), machine-to-machine (M2M) communication, cloud technology, and expansive big data analytics. This technological evolution underscores the necessity for advanced predictive maintenance strategies that proactively detect equipment anomalies before they escalate into costly downtime. Addressing this need, our research presents an end-to-end platform that merges the organizational capabilities of data warehousing with the computational efficiency of Apache Spark. This system adeptly manages voluminous time-series sensor data, leverages big data analytics for the seamless creation of machine learning models, and utilizes an Apache Spark-powered engine for the instantaneous processing of streaming data for fault detection. This comprehensive platform exemplifies a significant leap forward in smart manufacturing, offering a proactive maintenance model that enhances operational reliability and sustainability in the digital manufacturing era.
RESUMEN
Monoclonal antibodies (MAs) are increasingly used in the therapeutic arsenal. Clinical Data Warehouses (CDWs) offer unprecedented opportunities for research on real-word data. The objective of this work is to develop a knowledge organization system on MAs for therapeutic use (MATUs) applicable in Europe to query CDWs from a multi-terminology server (HeTOP). After expert consensus, three main health thesauri were selected: the MeSH thesaurus, the National Cancer Institute thesaurus (NCIt) and the SNOMED CT. These thesauri contain 1,723 MAs concepts, but only 99 (5.7 %) are identified as MATUs. The knowledge organisation system proposed in this article is a six-level hierarchical system according to their main therapeutic target. It includes 193 different concepts organised in a cross lingual terminology server, which will allow the inclusion of semantic extensions. Ninety nine (51.3 %) MATUs concepts and 94 (48.7 %) hierarchical concepts composed the knowledge organisation system. Two separates groups (an expert group and a validation group) carried out the selection, creation and validation processes. Queries identify, for unstructured data, 83 out of 99 (83.8 %) MATUs corresponding to 45,262 patients, 347,035 hospital stays and 427,544 health documents, and for structured data, 61 out of 99 (61.6 %) MATUs corresponding to 9,218 patients, 59,643 hospital stays and 104,737 hospital prescriptions. The volume of data in the CDW demonstrated the potential for using these data in clinical research, although not all MATUs are present in the CDW (16 missing for unstructured data and 38 for structured data). The knowledge organisation system proposed here improves the understanding of MATUs, the quality of queries and helps clinical researchers retrieve relevant medical information. The use of this model in CDW allows for the rapid identification of a large number of patients and health documents, either directly by a MATU of interest (e.g. Rituximab) but also by searching for parent concepts (e.g. Anti-CD20 Monoclonal Antibody).
Asunto(s)
Anticuerpos Monoclonales , Vocabulario Controlado , Humanos , Anticuerpos Monoclonales/uso terapéutico , Systematized Nomenclature of Medicine , Data Warehousing , Europa (Continente)RESUMEN
BACKGROUND: Population variant analysis is of great importance for gathering insights into the links between human genotype and phenotype. The 1000 Genomes Project established a valuable reference for human genetic variation; however, the integrative use of the corresponding data with other datasets within existing repositories and pipelines is not fully supported. Particularly, there is a pressing need for flexible and fast selection of population partitions based on their variant and metadata-related characteristics. RESULTS: Here, we target general germline or somatic mutation data sources for their seamless inclusion within an interoperable-format repository, supporting integration among them and with other genomic data, as well as their integrated use within bioinformatic workflows. In addition, we provide VarSum, a data summarization service working on sub-populations of interest selected using filters on population metadata and/or variant characteristics. The service is developed as an optimized computational framework with an Application Programming Interface (API) that can be called from within any existing computing pipeline or programming script. Provided example use cases of biological interest show the relevance, power and ease of use of the API functionalities. CONCLUSIONS: The proposed data integration pipeline and data set extraction and summarization API pave the way for solid computational infrastructures that quickly process cumbersome variation data, and allow biologists and bioinformaticians to easily perform scalable analysis on user-defined partitions of large cohorts from increasingly available genetic variation studies. With the current tendency to large (cross)nation-wide sequencing and variation initiatives, we expect an ever growing need for the kind of computational support hereby proposed.
Asunto(s)
Genómica , Metadatos , Biología Computacional , Genotipo , Humanos , Programas InformáticosRESUMEN
Organizational, administrative, and educational challenges in establishing and sustaining biomedical data science infrastructures lead to the inefficient use of Research Patient Data Repositories (RPDRs). The challenges, including but not limited to deployment, sustainability, cost optimization, collaboration, governance, security, rapid response, reliability, stability, scalability, and convenience, restrict each other and may not be naturally alleviated through traditional hardware upgrades or protocol enhancements. This article attempts to borrow data science thinking and practices in the business realm, which we call the data industry viewpoint, to improve RPDRs.
Asunto(s)
Bases de Datos como Asunto , HumanosRESUMEN
Nowadays, manufacturers are shifting from a traditional product-centric business paradigm to a service-centric one by offering products that are accompanied by services, which is known as Product-Service Systems (PSSs). PSS customization entails configuring products with varying degrees of differentiation to meet the needs of various customers. This is combined with service customization, in which configured products are expanded by customers to include smart IoT devices (e.g., sensors) to improve product usage and facilitate the transition to smart connected products. The concept of PSS customization is gaining significant interest; however, there are still numerous challenges that must be addressed when designing and offering customized PSSs, such as choosing the optimum types of sensors to install on products and their adequate locations during the service customization process. In this paper, we propose a data warehouse-based recommender system that collects and analyzes large volumes of product usage data from similar products to the product that the customer needs to customize by adding IoT smart devices. The analysis of these data helps in identifying the most critical parts with the highest number of incidents and the causes of those incidents. As a result, sensor types are determined and recommended to the customer based on the causes of these incidents. The utility and applicability of the proposed RS have been demonstrated through its application in a case study that considers the rotary spindle units of a CNC milling machine.
Asunto(s)
Comercio , Data WarehousingRESUMEN
BACKGROUND: The definition and the treatment of male urinary tract infections (UTIs) are imprecise. This study aims to determine the frequency of male UTIs in consultations of general practice, the diagnostic approach and the prescribed treatments. METHODS: We extracted the consultations of male patients, aged 18 years or more, during the period 2012-17 with the International Classification of Primary Care, version 2 codes for UTIs or associated symptoms from PRIMEGE/MEDISEPT databases of primary care. For eligible consultations in which all symptoms or codes were consistent with male UTIs, we identified patient history, prescribed treatments, antibiotic duration, clinical conditions, additional examinations and bacteriological results of urine culture. RESULTS: Our study included 610 consultations with 396 male patients (mean age 62.5 years). Male UTIs accounted for 0.097% of visits and 1.44 visits per physician per year. The UTIs most commonly identified were: undifferentiated (52%), prostatitis (36%), cystitis (8.5%) and pyelonephritis (3.5%). Fever was recorded in 14% of consultations. Urine dipstick test was done in 1.8% of consultations. Urine culture was positive for Escherichia coli in 50.4% of bacteriological tests. Fluoroquinolones were the most prescribed antibiotics (64.9%), followed by beta-lactams (17.4%), trimethoprim-sulfamethoxazole (11.9%) and nitrofurantoin (2.6%). CONCLUSIONS: Male UTIs are rare in general practice and have different presentations. The definition of male UTIs needs to be specified by prospective studies. Diagnostic evidence of male cystitis may reduce the duration of antibiotic therapy and spare critical antibiotics.
The definition and the treatment of male urinary tract infections (UTIs) are imprecise. We aimed to determine the frequency of male UTIs, the diagnostic approach and the prescribed treatments in French electronic health records of general practice. Our study included 610 consultations with 396 male patients with UTIs. In most cases, the organic site of the UTI was not determined. Prostatitis, cystitis and pyelonephritis were diagnosed to a lesser degree. Most patients did not have fever. Half of urine cultures were positive for Escherichia coli, a bacterium from the gastrointestinal tract. Antibiotics were the treatment of choice for male UTIs. In our study, fluoroquinolones (FQs) were the most prescribed antibiotics, then beta-lactams, trimethoprim-sulfamethoxazole and nitrofurantoin. All infections were treated in the same way. Male UTIs are rare in general practice and have different presentations. The resistance of bacteria to FQs is increasing. General practitioners should prescribe antibiotics carefully to avoid failure in the event of recurrent infections. Treating cystitis, prostatitis and pyelonephritis differently may reduce the duration of antibiotic therapy and spare critical antibiotics.
Asunto(s)
Medicina General , Infecciones Urinarias , Antibacterianos/uso terapéutico , Electrónica , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Infecciones Urinarias/diagnóstico , Infecciones Urinarias/tratamiento farmacológico , Infecciones Urinarias/epidemiologíaRESUMEN
BACKGROUND: Health services researchers spend a substantial amount of time performing integration, cleansing, interpretation, and aggregation of raw data from multiple public or private data sources. Often, each researcher (or someone in their team) duplicates this effort for their own project, facing the same challenges and experiencing the same pitfalls discovered by those before them. OBJECTIVE: This paper described a design process for creating a data warehouse that includes the most frequently used databases in health services research. METHODS: The design is based on a conceptual iterative process model framework that utilizes the sociotechnical systems theory approach and includes the capacity for subsequent updates of the existing data sources and the addition of new ones. We introduce the theory and the framework and then explain how they are used to inform the methodology of this study. RESULTS: The application of the iterative process model to the design research process of problem identification and solution design for the Healthcare Research and Analytics Data Infrastructure Solution (HRADIS) is described. Each phase of the iterative model produced end products to inform the implementation of HRADIS. The analysis phase produced the problem statement and requirements documents. The projection phase produced a list of tasks and goals for the ideal system. Finally, the synthesis phase provided the process for a plan to implement HRADIS. HRADIS structures and integrates data dictionaries provided by the data sources, allowing the creation of dimensions and measures for a multidimensional business intelligence system. We discuss how HRADIS is complemented with a set of data mining, analytics, and visualization tools to enable researchers to more efficiently apply multiple methods to a given research project. HRADIS also includes a built-in security and account management framework for data governance purposes to ensure customized authorization depending on user roles and parts of the data the roles are authorized to access. CONCLUSIONS: To address existing inefficiencies during the obtaining, extracting, preprocessing, cleansing, and filtering stages of data processing in health services research, we envision HRADIS as a full-service data warehouse integrating frequently used data sources, processes, and methods along with a variety of data analytics and visualization tools. This paper presents the application of the iterative process model to build such a solution. It also includes a discussion on several prominent issues, lessons learned, reflections and recommendations, and future considerations, as this model was applied.
Asunto(s)
Ciencia de los Datos/métodos , Data Warehousing/métodos , Bases de Datos Factuales/normas , Investigación sobre Servicios de Salud/métodos , HumanosRESUMEN
BACKGROUND: In the UK, several initiatives have resulted in the creation of local data warehouses of electronic patient records. Originally developed for commissioning and direct patient care, they are potentially useful for research, but little is known about them outside their home area. We describe one such local warehouse, the Whole Systems Integrated Care (WSIC) database in NW London, and its potential for research as the "Discover" platform. We compare Discover with the Clinical Practice Research Datalink (CPRD), a popular UK research database also based on linked primary care records. METHODS: We describe the key features of the Discover database, including scope, architecture and governance; descriptive analyses compare the population demographics and chronic disease prevalences with those in CPRD. RESULTS: As of June 2019, Discover held records for a total of 2.3 million currently registered patients, or 95% of the NW London population; CPRD held records for over 11 million. The Discover population matches the overall age-sex distribution of the UK and CPRD but is more ethnically diverse. Most Discover chronic disease prevalences were comparable to the national rates. Unlike CPRD, Discover has identifiable care organisations and postcodes, allowing mapping and linkage to healthcare provider variables such as staffing, and includes contacts with social, community and mental health care. Discover also includes a consent-to-contact register of over 3000 volunteers to date for prospective studies. CONCLUSIONS: Like CPRD, Discover has been a number of years in the making, is a valuable research tool, and can serve as a model for other areas developing similar data warehouses.
Asunto(s)
Prestación Integrada de Atención de Salud , Registros Electrónicos de Salud , Bases de Datos Factuales , Londres , Estudios Prospectivos , InvestigaciónRESUMEN
The increasing digitalization of social life opens up new possibilities for modern health care. This article describes innovative application possibilities that could help to sustainably improve the treatment of severe injuries in the future with the help of methods such as big data, artificial intelligence, intelligence augmentation, and machine learning. For the successful application of these methods, suitable data sources must be available. The TraumaRegister DGU® (TR-DGU) currently represents the largest database in Germany in the field of care for severely injured patients that could potentially be used for digital innovations. In this context, it is a good example of the problem areas such as data transfer, interoperability, standardization of data sets, parameter definitions, and ensuring data protection, which still represent major challenges for the digitization of trauma care. In addition to the further development of new analysis methods, solutions must also continue to be sought to the question of how best to intelligently link the relevant data from the various data sources.
Asunto(s)
Inteligencia Artificial , Servicios Médicos de Urgencia , Traumatismo Múltiple , Bases de Datos Factuales , Alemania , Humanos , Sistema de RegistrosRESUMEN
Over the last few years, various types of access control models have been proposed for expressing the growing needs of organizations. Out of these, there is an increasing interest towards specification and enforcement of flexible and dynamic decision making security policies using Attribute Based Access Control (ABAC). However, it is not easy to migrate an existing security policy specified in a different model into ABAC. Furthermore, there exists no comprehensive approach that can specify, enforce and manage ABAC policies along with other policies potentially already existing in the organization as a unified security policy. In this article, we present a unique and flexible solution that enables concurrent specification and enforcement of such security policies through storing and querying data in a multi-dimensional and multi-granular data model. Specifically, we present a unified database schema, similar to that traditionally used in data warehouse design, that can represent different types of access control policies and store relevant policies as in-memory data, thereby significantly reducing the execution time of access request evaluation. We also present a novel approach for combining multiple access control policies through meta-policies. For ease of management, an administrative schema is presented that can specify different types of administrative policies. Extensive experiments on a wide range of data sets demonstrate the viability of the proposed approach.
Asunto(s)
Salud Infantil , Ciencia de los Datos , Investigación sobre Servicios de Salud , Pediatría , Niño , HumanosRESUMEN
BACKGROUND: Atrial fibrillation is associated with an increased risk of cardiovascular hospitalization (CVH), which may be triggered by changes in daily burden. Machine learning of dynamic trends in atrial fibrillation burden, as measured by insertable cardiac monitors (ICMs), may be useful in predicting near-term CVH. METHODS: Using Optum's deidentified Clinformatics Data Mart Database (2007-2019), linked with the Medtronic CareLink ICM database, we identified patients with >1 days of ICM-detected atrial fibrillation. ICM-detected diagnostic parameters were transformed into simple moving averages over different periods for daily follow-up. A diagnostic trend was defined as the comparison of 2 simple moving averages of different periods for each diagnostic parameter. CVH was defined as any hospital, emergency department, or ambulatory surgical center encounter with a cardiovascular diagnosis-related group or diagnosis code. Machine learning was used to determine which diagnostic trends could best predict patient risk 5 days before CVH. RESULTS: A total of 2616 patients with ICMs met the inclusion criteria (71±11 years; 55% male). Among them, 1998 (76%) had a planned or unplanned CVH over 605â 363 days. Machine learning revealed distinct groups: (A) sinus rhythm (reference), (B) below-average burden, (C) above-average burden, and (D) above-average burden with decreasing patient activity. The relative risk was increased in all groups versus the reference (B, 4.49 [95% CI, 3.74-5.40]; C, 8.41 [95% CI, 7.00-10.11]; D, 11.15 [95% CI, 9.10-13.65]), including a 21% increase in CVH detection over prespecified burden thresholds of duration (≥1 hour) and quantity (≥5%). The area under the receiver operating characteristic curve increased from 0.55 when using hourly burden amounts to 0.66 when using burden trends and decreasing patient activity (P<0.001), a 20% increase in predictive power. CONCLUSIONS: Trends in atrial fibrillation were strongly associated with near-term CVH, especially above-average burden coupled with low patient activity. This approach could provide actionable information to guide treatment and reduce CVH.
RESUMEN
BACKGROUND: Little is known about how use patterns of medications for opioid use disorder (MOUDs) evolve from pre-incarceration to post-incarceration among incarcerated individuals with opioid use disorder. This article describes pre- and post-incarceration MOUD receipt during a period when naltrexone was the only type of MOUD offered in a state prison system, the Massachusetts Department of Correction (MADOC). METHODS: A retrospective cohort study of individuals with opioid use disorder who had an incarceration episode in MADOC during January 2015 to March 2019. The data source was the Massachusetts Public Health Data Warehouse, a multi-sector data platform that links individual-level data from multiple statewide datasets. We described patterns of MOUD receipt during the four weeks prior to and after an incarceration episode. Multivariable logistic regression models characterized predictors of post-incarceration MOUD receipt. RESULTS: In the male sample (n=691 incarcerations), from the pre- to post-incarceration periods, receipt of buprenorphine increased (14.3 % to 18.3 %), naltrexone increased (5.0 % to 10.5 %), and methadone decreased (4.7 % to 1.7 %). Similarly, in the female sample (n=892 incarcerations), from the pre- to post-incarceration periods, receipt of buprenorphine increased (10.3 % to 12.3 %, naltrexone increased (4.5 % to 9.3 %), and methadone decreased (5.0 % to 2.9 %). Much of the post-release naltrexone receipt occurred among participants in MADOC's pre-release naltrexone program. CONCLUSIONS: MOUD receipt was low but increased slightly in the post-incarceration period. This change was driven by increases in buprenorphine and naltrexone and despite decreases in methadone.
Asunto(s)
Encarcelamiento , Antagonistas de Narcóticos , Tratamiento de Sustitución de Opiáceos , Trastornos Relacionados con Opioides , Femenino , Humanos , Masculino , Buprenorfina/uso terapéutico , Estudios de Cohortes , Encarcelamiento/estadística & datos numéricos , Massachusetts/epidemiología , Metadona/uso terapéutico , Naltrexona/uso terapéutico , Antagonistas de Narcóticos/uso terapéutico , Tratamiento de Sustitución de Opiáceos/estadística & datos numéricos , Trastornos Relacionados con Opioides/tratamiento farmacológico , Trastornos Relacionados con Opioides/epidemiología , Prisioneros , Estudios RetrospectivosRESUMEN
Data harmonization is an important step in large-scale data analysis and for generating evidence on real world data in healthcare. With the OMOP common data model, a relevant instrument for data harmonization is available that is being promoted by different networks and communities. At the Hannover Medical School (MHH) in Germany, an Enterprise Clinical Research Data Warehouse (ECRDW) is established and harmonization of that data source is the focus of this work. We present MHH's first implementation of the OMOP common data model on top of the ECRDW data source and demonstrate the challenges concerning the mapping of German healthcare terminologies to a standardized format.
Asunto(s)
Análisis de Datos , Data Warehousing , Alemania , Instituciones de Salud , Facultades de MedicinaRESUMEN
Objective: i2b2 offers the possibility to store biomedical data of different projects in subject oriented data marts of the data warehouse, which potentially requires data replication between different projects and also data synchronization in case of data changes. We present an approach that can save this effort and assess its query performance in a case study that reflects real-world scenarios. Material and Methods: For data segregation, we used PostgreSQL's row level security (RLS) feature, the unit test framework pgTAP for validation and testing as well as the i2b2 application. No change of the i2b2 code was required. Instead, to leverage orchestration and deployment, we additionally implemented a command line interface (CLI). We evaluated performance using 3 different queries generated by i2b2, which we performed on an enlarged Harvard demo dataset. Results: We introduce the open source Python CLI i2b2rls, which orchestrates and manages security roles to implement data marts so that they do not need to be replicated and synchronized as different i2b2 projects. Our evaluation showed that our approach is on average 3.55 and on median 2.71 times slower compared to classic i2b2 data marts, but has more flexibility and easier setup. Conclusion: The RLS-based approach is particularly useful in a scenario with many projects, where data is constantly updated, user and group requirements change frequently or complex user authorization requirements have to be defined. The approach applies to both the i2b2 interface and direct database access.
RESUMEN
INTRODUCTION: The cytochrome P450 (CYP450) enzyme system is involved in the metabolism of certain drugs and is responsible for most drug interactions. These interactions result in either an enzymatic inhibition or an enzymatic induction mechanism that has an impact on the therapeutic management of patients. Detecting these drug interactions will allow for better predictability in therapeutic response. Therefore, computerized solutions can represent a valuable help for clinicians in their tasks of detection. OBJECTIVE: The objective of this study is to provide a structured data-source of interactions involving the CYP450 enzyme system. These interactions are aimed to be integrated in the cross-lingual multi-terminology server HeTOP (Health Terminologies and Ontologies Portal), to support the query processing of the clinical data warehouse (CDW) EDSaN (Entrepôt de Données de Santé Normand). MATERIAL AND METHODS: A selection and curation of drug components (DCs) that share a relationship with the CYP450 system was performed from several international data sources. The DCs were linked according to the type of relationship which can be substrate, inhibitor, or inducer. These relationships were then integrated into the HeTOP server. To validate the CYP450 relationships, a semantic query was performed on the CDW, whose search engine is founded on HeTOP data (concepts, terms, and relations). RESULTS: A total of 776 DCs are associated by a new interaction relationship, integrated in HeTOP, by 14 enzymes. These are CYP450 1A2, 2A6, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 3A4, 3A7, 11B1,11B2 mitochondrial and P-glycoprotein, constituting a total of 2,088 relationships. A general modelling of cytochromic interactions was performed. From this model, 233,006 queries were processed in less than two hours, demonstrating the usefulness and performance of our CDW implementation. Moreover, they showed that in our university hospital, the concurrent prescription that could cause a cytochromic interaction is Bisoprolol with Amiodarone by enzymatic inhibition for 2,493 patients. DISCUSSION: The queries submitted to the CDW EDSaN allowed to highlight the most prescribed molecules simultaneously and potentially responsible for cytochromic interactions. In a second step, it would be interesting to evaluate the real clinical impact by looking for possible adverse effects of these interactions in the patients' files. Other computational solutions for cytochromic interactions exist. The impact of CYP450 is particularly important for drugs with narrow therapeutic window (NTW) as they can lead to increased toxicity or therapeutic failure. It is also important to define which drug component is a pro-drug and to considerate the many genetic polymorphisms of patients. CONCLUSION: The HeTOP server contains a non-negligible number of relationships between drug components and CYP450 from multiple reference sources. These data allow us to query our Clinical Data Warehouse to highlight these cytochromic interactions. It would be interesting in the future to assess the actual clinical impact in hospital reports.
Asunto(s)
Sistema Enzimático del Citocromo P-450 , Data Warehousing , Humanos , Sistema Enzimático del Citocromo P-450/genética , Sistema Enzimático del Citocromo P-450/metabolismoRESUMEN
BACKGROUND: In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible. OBJECTIVE: The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks. METHODS: This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English. RESULTS: We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%). CONCLUSIONS: CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.
RESUMEN
OBJECTIVE: Patients who receive most care within a single healthcare system (colloquially called a "loyalty cohort" since they typically return to the same providers) have mostly complete data within that organization's electronic health record (EHR). Loyalty cohorts have low data missingness, which can unintentionally bias research results. Using proxies of routine care and healthcare utilization metrics, we compute a per-patient score that identifies a loyalty cohort. MATERIALS AND METHODS: We implemented a computable program for the widely adopted i2b2 platform that identifies loyalty cohorts in EHRs based on a machine-learning model, which was previously validated using linked claims data. We developed a novel validation approach, which tests, using only EHR data, whether patients returned to the same healthcare system after the training period. We evaluated these tools at 3 institutions using data from 2017 to 2019. RESULTS: Loyalty cohort calculations to identify patients who returned during a 1-year follow-up yielded a mean area under the receiver operating characteristic curve of 0.77 using the original model and 0.80 after calibrating the model at individual sites. Factors such as multiple medications or visits contributed significantly at all sites. Screening tests' contributions (eg, colonoscopy) varied across sites, likely due to coding and population differences. DISCUSSION: This open-source implementation of a "loyalty score" algorithm had good predictive power. Enriching research cohorts by utilizing these low-missingness patients is a way to obtain the data completeness necessary for accurate causal analysis. CONCLUSION: i2b2 sites can use this approach to select cohorts with mostly complete EHR data.