Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 85
Filter
1.
Stud Health Technol Inform ; 316: 1577-1581, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176509

ABSTRACT

Hospital laboratory results are a significant data source in Clinical Data Ware-houses (CDW). To ensure comparability across healthcare organizations and for use in research studies, the results need to be interoperable. The LOINC (Logical Observation Identifiers, Names, and Codes) terminology provides a unique identifier for local codes for lab tests, enabling interoperability. However, in real-world, events occur over time and can disrupt the distribution of lab result values. For example, new equipment may be added to the analysis pipeline, a machine may be replaced, formulas may evolve due to new scientific knowledge, and legacy terminologies may be adopted. This article proposes a pipeline for creating an automated dashboard to monitor these events and data quality. We used automatic change point detection methods such as PELT for event detection in lab results. For a given LOINC code, we create a dashboard that summarizes the number of local codes mapped, and the number of patients (by sex, age, and hospital service) associated with the code. Finally, the dashboard enables the visualization of time events that disrupt the signal distribution. The biologists were able to explain to us the changes for several biological assays.


Subject(s)
Data Warehousing , Humans , Logical Observation Identifiers Names and Codes , Clinical Laboratory Information Systems , Electronic Health Records , User-Computer Interface
2.
Stud Health Technol Inform ; 316: 1584-1588, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176511

ABSTRACT

This study assesses the effectiveness of the Observational Medical Outcomes Partnership common data model (OMOP CDM) in standardising Continuous Renal Replacement Therapy (CRRT) data from intensive care units (ICU) of two French university hospitals. Our objective was to extract and standardise data from various sources, enabling the development of predictive models for CRRT weaning that are agnostic to the data's origin. Data for 1,696 ICU stays from the two data sources were extracted, transformed, and loaded into the OMOP format after semantic alignment of 46 CRRT standard concepts. Although the OMOP CDM demonstrated potential in harmonising CRRT data, we encountered challenges related to data variability and the lack of standard concepts. Despite these challenges, our study supports the promise of the OMOP CDM for ICU data standardization, suggesting that further refinement and adaptation could significantly improve clinical decision making and patient outcomes in critical care settings.


Subject(s)
Intensive Care Units , Humans , France , Intensive Care Units/standards , Continuous Renal Replacement Therapy , Data Accuracy , Critical Care/standards , Renal Replacement Therapy/standards
3.
Stud Health Technol Inform ; 316: 1605-1606, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176517

ABSTRACT

This paper presents the development of a visualization dashboard for quality indicators in intensive care units (ICUs), using the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The dashboard enables the user to visualize quality indicator data using histograms, pie charts and tables. Our project uses the OMOP CDM, ensuring a seamless implementation of our dashboard across various hospitals. Future directions for our research include expanding the dashboard to incorporate additional quality indicators and evaluating clinicians' feedback on its effectiveness.


Subject(s)
Intensive Care Units , Quality Indicators, Health Care , Intensive Care Units/standards , Critical Care/standards , Humans , User-Computer Interface , Outcome Assessment, Health Care , Benchmarking
4.
Stud Health Technol Inform ; 316: 1739-1743, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176549

ABSTRACT

Continuous unfractionated heparin is widely used in intensive care, yet its complex pharmacokinetic properties complicate the determination of appropriate doses. To address this challenge, we developed machine learning models to predict over- and under-dosing, based on anti-Xa results, using a monocentric retrospective dataset. The random forest model achieved a mean AUROC of 0.80 [0.77-0.83], while the XGB model reached a mean AUROC of 0.80 [0.76-0.83]. Feature importance was employed to enhance the interpretability of the model, a critical factor for clinician acceptance. After prospective validation, machine learning models such as those developed in this study could be implemented within a computerized physician order entry (CPOE) as a clinical decision support system (CDSS).


Subject(s)
Anticoagulants , Decision Support Systems, Clinical , Heparin , Intensive Care Units , Machine Learning , Heparin/therapeutic use , Humans , Anticoagulants/therapeutic use , Medical Order Entry Systems , Retrospective Studies
5.
Stud Health Technol Inform ; 316: 221-225, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176713

ABSTRACT

This paper introduces a novel approach aimed at enhancing the accessibility of clinical data warehouses (CDWs) for external users, particularly researchers and biomedical companies interested in developing and testing their solutions. The primary focus is on proposing a clinical data catalogue designed to elucidate the contents of CDWs, facilitating biomedical project launch and completion. The catalogue is designed to address three fundamental inquiries that external users may have regarding CDWs: "What data is available, how much data is present, and how was it generated?" Additionally, the paper showcases a prototype of the catalogue through a visualization example, utilizing data from the CDW of Rennes University Hospital.


Subject(s)
Data Warehousing , Electronic Health Records , Humans
6.
Stud Health Technol Inform ; 316: 611-615, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176816

ABSTRACT

Secure extraction of Personally Identifiable Information (PII) from Electronic Health Records (EHRs) presents significant privacy and security challenges. This study explores the application of Federated Learning (FL) to overcome these challenges within the context of French EHRs. By utilizing a multilingual BERT model in an FL simulation involving 20 hospitals, each represented by a unique medical department or pole, we compared the performance of two setups: individual models, where each hospital uses only its own training and validation data without engaging in the FL process, and federated models, where multiple hospitals collaborate to train a global FL model. Our findings demonstrate that FL models not only preserve data confidentiality but also outperform the individual models. In fact, the Global FL model achieved an F1 score of 75,7%, slightly comparable to that of the Centralized approach at 78,5%. This research underscores the potential of FL in extracting PIIs from EHRs, encouraging its broader adoption in health data analysis.


Subject(s)
Computer Security , Confidentiality , Electronic Health Records , Machine Learning , France , Humans , Health Records, Personal
7.
Stud Health Technol Inform ; 316: 1979-1983, 2024 Aug 22.
Article in English | MEDLINE | ID: mdl-39176881

ABSTRACT

Electronic health data concerning implantable medical devices (IMD) opens opportunities for dynamic real-world monitoring to assess associated risks related to implanted materials. Due to population ageing and expanding demands, total hip, knee, and shoulder arthroplasties are increasing. Automating the collection and analysis of orthopedic device features could benefit physicians and public health policies enabling early issue detection, IMD monitoring and patient safety assessment. A machine learning tool using natural language processing (NLP) was developed for the automated extraction of operation information from medical reports in orthopedics. A corpus of 959 orthopaedic operative reports from 5 centres was manually annotated using the Prodigy software® with a strong inter-annotator agreement of 0.80. Data to extract concerned key clinical and procedure information (n= 9) selected by a multidisciplinary group based on the French health authority checklist. Performances parameters of the NLP model estimated an overall strong precision and recall of respectively 97.0 and 96.0 with a F1-score 96.3. Systematic monitoring of orthopedic devices could be ensured by an automated tool, leveraging clinical data warehouses. Traceability of medical devices with implantation modalities will allow detection of implant factors leading to complications. The evidence from real-world data could provide concrete and dynamic insights to surgeons and infectious disease specialists concerning implant follow-up, guiding therapeutic decision-making, and informing public health policymakers. The tool will be applied on clinical data warehouses to automate information extraction and presentation, providing feedback on mandatory information completion and contents of operative reports to support improvements, and thereafter implant research projects.


Subject(s)
Electronic Health Records , Machine Learning , Natural Language Processing , France , Humans , Orthopedic Procedures
8.
J Thromb Haemost ; 22(10): 2864-2872, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39019439

ABSTRACT

BACKGROUND: Tinzaparin could be easier to manage than unfractionated heparin in patients with severe renal impairment. However, clinical and pharmacologic data regarding its use in such patients are lacking. OBJECTIVES: The aims of this study were to determine, in patients with estimated glomerular filtration rate (eGFR) of <30 mL.min⁻1, tinzaparin pharmacokinetics (PK) parameters using a population PK approach and bleeding and thrombotic complications. METHODS: We performed a retrospective observational single-center study, including in-patients with eGFR of <30 mL.min⁻1 receiving prophylactic (4500 IU.d⁻1) or therapeutic (175 IU.kg⁻1.d⁻1) tinzaparin. Measured anti-Xa levels were analyzed using a nonlinear mixed-effects modeling approach. Individual predicted tinzaparin exposure markers at steady state were calculated for each patient and dosing regimen. The PK was also evaluated through Monte Carlo simulations based on the final covariate model parameter estimates. RESULTS: Over a 22-month period, 802 tinzaparin treatment periods in 623 patients were analyzed: two-thirds received a prophylactic dose, 66% had an eGFR of <20 mL.min⁻1, and 25% were on renal replacement therapy. In patients for whom anti-Xa measurements were performed (n = 199; 746 values), PK parameters, profiles, and maximum plasma concentrations were comparable with those in patients without renal impairment or in healthy volunteers. In the whole population, major bleeding occurred in 2.4% and 3.5% of patients receiving prophylactic and therapeutic doses over a median 9- and 7-day treatment period, respectively. No patients had thrombotic complications. CONCLUSION: Tinzaparin PK parameters and profiles were not affected by renal impairment. This suggests that tinzaparin, at therapeutic or prophylactic dose, could be an alternative to unfractionated heparin in hospitalized patients with severe renal impairment.


Subject(s)
Glomerular Filtration Rate , Hemorrhage , Heparin, Low-Molecular-Weight , Heparin , Kidney Failure, Chronic , Tinzaparin , Humans , Tinzaparin/administration & dosage , Tinzaparin/pharmacokinetics , Retrospective Studies , Male , Female , Aged , Middle Aged , Hemorrhage/chemically induced , Heparin/administration & dosage , Heparin/pharmacokinetics , Heparin/adverse effects , Kidney Failure, Chronic/complications , Kidney Failure, Chronic/blood , Heparin, Low-Molecular-Weight/administration & dosage , Heparin, Low-Molecular-Weight/pharmacokinetics , Fibrinolytic Agents/administration & dosage , Fibrinolytic Agents/pharmacokinetics , Fibrinolytic Agents/adverse effects , Anticoagulants/administration & dosage , Anticoagulants/pharmacokinetics , Anticoagulants/adverse effects , Thrombosis/blood , Thrombosis/drug therapy , Thrombosis/prevention & control , Monte Carlo Method , Aged, 80 and over , Treatment Outcome
9.
Open Heart ; 11(1)2024 May 03.
Article in English | MEDLINE | ID: mdl-38702088

ABSTRACT

BACKGROUND: Systemic lupus erythematosus (SLE) is a heterogeneous autoimmune disease. Cardiac involvement in SLE is rare but plays an important prognostic role. The degree of cardiac involvement according to SLE subsets defined by non-cardiac manifestations is unknown. The objective of this study was to identify differences in transthoracic echocardiography (TTE) parameters associated with different SLE subgroups. METHODS: One hundred eighty-one patients who fulfilled the 2019 American College of Rheumatology/EULAR classification criteria for SLE and underwent baseline TTE were included in this cross-sectional study. We defined four subsets of SLE based on the predominant clinical manifestations. A multivariate multinomial regression analysis was performed to determine whether TTE parameters differed between groups. RESULTS: Four clinical subsets were defined according to non-cardiac clinical manifestations: group A (n=37 patients) showed features of mixed connective tissue disease, group B (n=76 patients) had primarily cutaneous involvement, group C (n=18) exhibited prominent serositis and group D (n=50) had severe, multi-organ involvement, including notable renal disease. Forty TTE parameters were assessed between groups. Per multivariate multinomial regression analysis, there were statistically significant differences in early diastolic tricuspid annular velocity (RV-Ea, p<0.0001), RV S' wave (p=0.0031) and RV end-diastolic diameter (p=0.0419) between the groups. Group B (primarily cutaneous involvement) had the lowest degree of RV dysfunction. CONCLUSION: When defining clinical phenotypes of SLE based on organ involvement, we found four distinct subgroups which showed notable differences in RV function on TTE. Risk-stratifying patients by clinical phenotype could help better tailor cardiac follow-up in this population.


Subject(s)
Echocardiography , Heart Ventricles , Lupus Erythematosus, Systemic , Ventricular Function, Right , Humans , Lupus Erythematosus, Systemic/complications , Lupus Erythematosus, Systemic/diagnosis , Lupus Erythematosus, Systemic/physiopathology , Female , Male , Cross-Sectional Studies , Adult , Middle Aged , Ventricular Function, Right/physiology , Echocardiography/methods , Heart Ventricles/diagnostic imaging , Heart Ventricles/physiopathology , Ventricular Dysfunction, Right/physiopathology , Ventricular Dysfunction, Right/etiology , Ventricular Dysfunction, Right/diagnostic imaging , Retrospective Studies , Prognosis
10.
BMC Med Inform Decis Mak ; 24(1): 54, 2024 Feb 16.
Article in English | MEDLINE | ID: mdl-38365677

ABSTRACT

BACKGROUND: Electronic health records (EHRs) contain valuable information for clinical research; however, the sensitive nature of healthcare data presents security and confidentiality challenges. De-identification is therefore essential to protect personal data in EHRs and comply with government regulations. Named entity recognition (NER) methods have been proposed to remove personal identifiers, with deep learning-based models achieving better performance. However, manual annotation of training data is time-consuming and expensive. The aim of this study was to develop an automatic de-identification pipeline for all kinds of clinical documents based on a distant supervised method to significantly reduce the cost of manual annotations and to facilitate the transfer of the de-identification pipeline to other clinical centers. METHODS: We proposed an automated annotation process for French clinical de-identification, exploiting data from the eHOP clinical data warehouse (CDW) of the CHU de Rennes and national knowledge bases, as well as other features. In addition, this paper proposes an assisted data annotation solution using the Prodigy annotation tool. This approach aims to reduce the cost required to create a reference corpus for the evaluation of state-of-the-art NER models. Finally, we evaluated and compared the effectiveness of different NER methods. RESULTS: A French de-identification dataset was developed in this work, based on EHRs provided by the eHOP CDW at Rennes University Hospital, France. The dataset was rich in terms of personal information, and the distribution of entities was quite similar in the training and test datasets. We evaluated a Bi-LSTM + CRF sequence labeling architecture, combined with Flair + FastText word embeddings, on a test set of manually annotated clinical reports. The model outperformed the other tested models with a significant F1 score of 96,96%, demonstrating the effectiveness of our automatic approach for deidentifying sensitive information. CONCLUSIONS: This study provides an automatic de-identification pipeline for clinical notes, which can facilitate the reuse of EHRs for secondary purposes such as clinical research. Our study highlights the importance of using advanced NLP techniques for effective de-identification, as well as the need for innovative solutions such as distant supervision to overcome the challenge of limited annotated data in the medical domain.


Subject(s)
Deep Learning , Humans , Data Anonymization , Electronic Health Records , Cost-Benefit Analysis , Confidentiality , Natural Language Processing
11.
J Am Geriatr Soc ; 72(4): 1060-1069, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38348519

ABSTRACT

BACKGROUND: Antibiotics play a central role in infection management. In older patients, antibiotics are frequently administered subcutaneously. Ceftriaxone pharmacokinetics after subcutaneous administration is well documented, but little data are available on its safety. METHODS: We compared the occurrence of adverse events associated with ceftriaxone administered subcutaneously versus intravenously in ≥75-year-old patients. We used data from a single-center, retrospective, clinical-administrative database to compare the occurrence of adverse events at day 14 and outcome at day 21 in older patients who received ceftriaxone via the subcutaneous route or the intravenous route at Rennes University Hospital, France, from May 2020 to February 2023. RESULTS: The subcutaneous and intravenous groups included 402 and 3387 patients, respectively. Patients in the subcutaneous group were older and more likely to receive palliative care. At least one adverse event was reported for 18% and 40% of patients in the subcutaneous and intravenous group, respectively (RR = 2.21). Mortality at day 21 was higher in the subcutaneous route group, which could be linked to between-group differences in clinical and demographic features. CONCLUSIONS: In ≥75-year-old patients, ceftriaxone administered by the subcutaneous route is associated with less-adverse events than by the intravenous route. The subcutaneous route, which is easier to use, has a place in infection management in geriatric settings.


Subject(s)
Anti-Bacterial Agents , Ceftriaxone , Humans , Aged , Ceftriaxone/adverse effects , Retrospective Studies , Infusions, Intravenous , Administration, Intravenous , Anti-Bacterial Agents/adverse effects
12.
Eur Heart J Open ; 4(1): oead133, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38196848

ABSTRACT

Aims: Patients presenting symptoms of heart failure with preserved ejection fraction (HFpEF) are not a homogenous population. Different phenotypes can differ in prognosis and optimal management strategies. We sought to identify phenotypes of HFpEF by using the medical information database from a large university hospital centre using machine learning. Methods and results: We explored the use of clinical variables from electronic health records in addition to echocardiography to identify different phenotypes of patients with HFpEF. The proposed methodology identifies four phenotypic clusters based on both clinical and echocardiographic characteristics, which have differing prognoses (death and cardiovascular hospitalization). Conclusion: This work demonstrated that artificial intelligence-derived phenotypes could be used as a tool for physicians to assess risk and to target therapies that may improve outcomes.

13.
Article in English | MEDLINE | ID: mdl-37831905

ABSTRACT

OBJECTIVES: Systemic lupus erythematosus (SLE) is a systemic autoimmune disease characterized by heterogeneous manifestations and severity, with frequent lung involvement. Among pulmonary function tests (PFT), the measure of the diffusing capacity of the lungs for carbon monoxide (DLCO) is a noninvasive and sensitive tool assessing pulmonary microcirculation. Asymptomatic and isolated DLCO alteration has been frequently reported in SLE, but its clinical relevance has not been established. METHODS: This retrospective study focused on 232 SLE patients fulfilling the 2019 EULAR/ACR classification criteria for SLE. Data were collected from the patient's medical record, including demographic, clinical, and immunological characteristics while DLCO was measured when performing PFT as part of routine patient follow-up. RESULTS: At the end of follow-up, DLCO alteration (<70% of predicted value) was measured at least once in 154 patients (66.4%), and was associated with a history of smoking as well as interstitial lung disease (ILD), but was also associated with renal and neurological involvement. History of smoking, detection of anti-nucleosome autoantibodies and clinical lymphadenopathy at diagnosis were independent predictors of DLCO alteration, while early cutaneous involvement with photosensitivity was a protective factor. DLCO alteration, at baseline or anytime during follow-up was predictive of admission in intensive care unit and/or of all-cause death, both mainly due to severe disease flares and premature cardiovascular complications. CONCLUSION: This study suggests a link between DLCO alteration and disease damage, potentially related to SLE vasculopathy, and prognostic value of DLCO on death or ICU admission in SLE.

14.
Stud Health Technol Inform ; 302: 342-343, 2023 May 18.
Article in English | MEDLINE | ID: mdl-37203675

ABSTRACT

In France and in other countries, we observed a significant growth in human polyvalent immunoglobulins (PvIg) usage. PvIg is manufactured from plasma collected from numeral donors, and its production is complex. Supply tensions have been observed for several years, and it is necessary to limit their consumption. Therefore, French Health Authority (FHA) provided guidelines in June 2018 to restrict their usage. This research aims to assess the guidelines' impact of the FHA on the use of PvIg. We analyzed data from Rennes University Hospital, where all PvIg prescriptions are reported electronically with quantity, rhythm, and indication. From the clinical data warehouses of RUH, we extracted comorbidities and lab results to evaluate the more complex guidelines. We globally noticed a reduction in the consumption of PvIg after the guidelines. Compliance with the recommended quantities and rhythms have also been observed. By combining two sources of data, we have been able to show an impact of FHA's guidelines on the consumption of PvIg.


Subject(s)
Data Warehousing , Immunoglobulins , Humans , Drug Prescriptions , Comorbidity , France
15.
Antibiotics (Basel) ; 12(4)2023 Mar 30.
Article in English | MEDLINE | ID: mdl-37107042

ABSTRACT

BACKGROUND: Amoxicillin (AMX)-induced neurotoxicity is well described and may be associated with AMX overexposure. No neurotoxic concentration threshold has been determined thus far. A better knowledge of maximum tolerable AMX concentrations is of importance to improve the safety of high doses of AMX. METHODS: We conducted a retrospective study using the local hospital data warehouse EhOP® to generate a specific query related to AMX neurotoxicity symptomatology. All patient medical reports containing a mention of neurotoxicity clinical symptoms coupled with AMX plasma concentration measurements were explored. Patients were classified into two groups according to the imputability of AMX in the onset of their neurotoxicity, on the basis of chronological and semiological criteria. A receiver-operating characteristic curve was performed to identify an AMX neurotoxic steady-state concentration (Css) threshold. RESULTS: The query identified 101 patients among 2054 patients benefiting from AMX TDM. Patients received a median daily dose of 9 g AMX, with a median creatinine clearance of 51 mL/min. A total of 17 of the 101 patients exhibited neurotoxicity attributed to AMX. The mean Css was higher for patients with neurotoxicity attributed to AMX (118 ± 62 mg/L) than those without 74 ± 48 mg/L (p = 0.002). A threshold AMX concentration of 109.7 mg/L predicted the occurrence of neurotoxicity. CONCLUSIONS: This study identified, for the first time, an AMX Css threshold of 109.7 mg/L associated with an excess risk of neurotoxicity. This approach needs to be confirmed by a prospective study with systematic neurological evaluation and TDM.

16.
Health Informatics J ; 29(1): 14604582221146709, 2023.
Article in English | MEDLINE | ID: mdl-36964666

ABSTRACT

Defining profiles of patients that could benefit from relevant anti-cancer treatments is essential. An increasing number of specific criteria are necessary to be eligible to specific anti-cancer therapies. This study aimed to develop an automated algorithm able to detect patient and tumor characteristics to reduce the time-consuming prescreening for trial inclusions without delay. Hence, 640 anonymized multidisciplinary team meetings (MTM) reports concerning lung cancers from one French teaching hospital data warehouse between 2018 and 2020 were annotated. To automate the extraction of eight major eligibility criteria, corresponding to 52 classes, regular expressions were implemented. The RegEx's evaluation gave a F1-score of 93% in average, a positive predictive value (precision) of 98% and sensitivity (recall) of 92%. However, in MTM, fill rates variabilities among patient and tumor information remained important (from 31% to 100%). Genetic mutations and rearrangement test results were the least reported characteristics and also the hardest to automatically extract. To ease prescreening in clinical trials, the PreScIOUs study demonstrated the additional value of rule based and machine learning based methods applied on lung cancer MTM reports.


Subject(s)
Lung Neoplasms , Natural Language Processing , Humans , Lung Neoplasms/therapy , Electronic Health Records , Algorithms , Patient Care Team
17.
JMIR Public Health Surveill ; 9: e34982, 2023 01 31.
Article in English | MEDLINE | ID: mdl-36719726

ABSTRACT

BACKGROUND: Disease surveillance systems capable of producing accurate real-time and short-term forecasts can help public health officials design timely public health interventions to mitigate the effects of disease outbreaks in affected populations. In France, existing clinic-based disease surveillance systems produce gastroenteritis activity information that lags real time by 1 to 3 weeks. This temporal data gap prevents public health officials from having a timely epidemiological characterization of this disease at any point in time and thus leads to the design of interventions that do not take into consideration the most recent changes in dynamics. OBJECTIVE: The goal of this study was to evaluate the feasibility of using internet search query trends and electronic health records to predict acute gastroenteritis (AG) incidence rates in near real time, at the national and regional scales, and for long-term forecasts (up to 10 weeks). METHODS: We present 2 different approaches (linear and nonlinear) that produce real-time estimates, short-term forecasts, and long-term forecasts of AG activity at 2 different spatial scales in France (national and regional). Both approaches leverage disparate data sources that include disease-related internet search activity, electronic health record data, and historical disease activity. RESULTS: Our results suggest that all data sources contribute to improving gastroenteritis surveillance for long-term forecasts with the prominent predictive power of historical data owing to the strong seasonal dynamics of this disease. CONCLUSIONS: The methods we developed could help reduce the impact of the AG peak by making it possible to anticipate increased activity by up to 10 weeks.


Subject(s)
Disease Outbreaks , Electronic Health Records , Humans , Public Health/methods , Internet , France/epidemiology
18.
JMIR Public Health Surveill ; 8(12): e37122, 2022 12 22.
Article in English | MEDLINE | ID: mdl-36548023

ABSTRACT

BACKGROUND: Traditionally, dengue prevention and control rely on vector control programs and reporting of symptomatic cases to a central health agency. However, case reporting is often delayed, and the true burden of dengue disease is often underestimated. Moreover, some countries do not have routine control measures for vector control. Therefore, researchers are constantly assessing novel data sources to improve traditional surveillance systems. These studies are mostly carried out in big territories and rarely in smaller endemic regions, such as Martinique and the Lesser Antilles. OBJECTIVE: The aim of this study was to determine whether heterogeneous real-world data sources could help reduce reporting delays and improve dengue monitoring in Martinique island, a small endemic region. METHODS: Heterogenous data sources (hospitalization data, entomological data, and Google Trends) and dengue surveillance reports for the last 14 years (January 2007 to February 2021) were analyzed to identify associations with dengue outbreaks and their time lags. RESULTS: The dengue hospitalization rate was the variable most strongly correlated with the increase in dengue positivity rate by real-time reverse transcription polymerase chain reaction (Pearson correlation coefficient=0.70) with a time lag of -3 weeks. Weekly entomological interventions were also correlated with the increase in dengue positivity rate by real-time reverse transcription polymerase chain reaction (Pearson correlation coefficient=0.59) with a time lag of -2 weeks. The most correlated query from Google Trends was the "Dengue" topic restricted to the Martinique region (Pearson correlation coefficient=0.637) with a time lag of -3 weeks. CONCLUSIONS: Real-word data are valuable data sources for dengue surveillance in smaller territories. Many of these sources precede the increase in dengue cases by several weeks, and therefore can help to improve the ability of traditional surveillance systems to provide an early response in dengue outbreaks. All these sources should be better integrated to improve the early response to dengue outbreaks and vector-borne diseases in smaller endemic territories.


Subject(s)
Disease Outbreaks , Humans , Retrospective Studies , Martinique/epidemiology
19.
JMIR Med Inform ; 10(11): e36711, 2022 Nov 01.
Article in English | MEDLINE | ID: mdl-36318244

ABSTRACT

BACKGROUND: Often missing from or uncertain in a biomedical data warehouse (BDW), vital status after discharge is central to the value of a BDW in medical research. The French National Mortality Database (FNMD) offers open-source nominative records of every death. Matching large-scale BDWs records with the FNMD combines multiple challenges: absence of unique common identifiers between the 2 databases, names changing over life, clerical errors, and the exponential growth of the number of comparisons to compute. OBJECTIVE: We aimed to develop a new algorithm for matching BDW records to the FNMD and evaluated its performance. METHODS: We developed a deterministic algorithm based on advanced data cleaning and knowledge of the naming system and the Damerau-Levenshtein distance (DLD). The algorithm's performance was independently assessed using BDW data of 3 university hospitals: Lille, Nantes, and Rennes. Specificity was evaluated with living patients on January 1, 2016 (ie, patients with at least 1 hospital encounter before and after this date). Sensitivity was evaluated with patients recorded as deceased between January 1, 2001, and December 31, 2020. The DLD-based algorithm was compared to a direct matching algorithm with minimal data cleaning as a reference. RESULTS: All centers combined, sensitivity was 11% higher for the DLD-based algorithm (93.3%, 95% CI 92.8-93.9) than for the direct algorithm (82.7%, 95% CI 81.8-83.6; P<.001). Sensitivity was superior for men at 2 centers (Nantes: 87%, 95% CI 85.1-89 vs 83.6%, 95% CI 81.4-85.8; P=.006; Rennes: 98.6%, 95% CI 98.1-99.2 vs 96%, 95% CI 94.9-97.1; P<.001) and for patients born in France at all centers (Nantes: 85.8%, 95% CI 84.3-87.3 vs 74.9%, 95% CI 72.8-77.0; P<.001). The DLD-based algorithm revealed significant differences in sensitivity among centers (Nantes, 85.3% vs Lille and Rennes, 97.3%, P<.001). Specificity was >98% in all subgroups. Our algorithm matched tens of millions of death records from BDWs, with parallel computing capabilities and low RAM requirements. We used the Inseehop open-source R script for this measurement. CONCLUSIONS: Overall, sensitivity/recall was 11% higher using the DLD-based algorithm than that using the direct algorithm. This shows the importance of advanced data cleaning and knowledge of a naming system through DLD use. Statistically significant differences in sensitivity between groups could be found and must be considered when performing an analysis to avoid differential biases. Our algorithm, originally conceived for linking a BDW with the FNMD, can be used to match any large-scale databases. While matching operations using names are considered sensitive computational operations, the Inseehop package released here is easy to run on premises, thereby facilitating compliance with cybersecurity local framework. The use of an advanced deterministic matching algorithm such as the DLD-based algorithm is an insightful example of combining open-source external data to improve the usage value of BDWs.

20.
JMIR Med Inform ; 10(10): e38936, 2022 Oct 17.
Article in English | MEDLINE | ID: mdl-36251369

ABSTRACT

BACKGROUND: Despite the many opportunities data reuse offers, its implementation presents many difficulties, and raw data cannot be reused directly. Information is not always directly available in the source database and needs to be computed afterwards with raw data for defining an algorithm. OBJECTIVE: The main purpose of this article is to present a standardized description of the steps and transformations required during the feature extraction process when conducting retrospective observational studies. A secondary objective is to identify how the features could be stored in the schema of a data warehouse. METHODS: This study involved the following 3 main steps: (1) the collection of relevant study cases related to feature extraction and based on the automatic and secondary use of data; (2) the standardized description of raw data, steps, and transformations, which were common to the study cases; and (3) the identification of an appropriate table to store the features in the Observation Medical Outcomes Partnership (OMOP) common data model (CDM). RESULTS: We interviewed 10 researchers from 3 French university hospitals and a national institution, who were involved in 8 retrospective and observational studies. Based on these studies, 2 states (track and feature) and 2 transformations (track definition and track aggregation) emerged. "Track" is a time-dependent signal or period of interest, defined by a statistical unit, a value, and 2 milestones (a start event and an end event). "Feature" is time-independent high-level information with dimensionality identical to the statistical unit of the study, defined by a label and a value. The time dimension has become implicit in the value or name of the variable. We propose the 2 tables "TRACK" and "FEATURE" to store variables obtained in feature extraction and extend the OMOP CDM. CONCLUSIONS: We propose a standardized description of the feature extraction process. The process combined the 2 steps of track definition and track aggregation. By dividing the feature extraction into these 2 steps, difficulty was managed during track definition. The standardization of tracks requires great expertise with regard to the data, but allows the application of an infinite number of complex transformations. On the contrary, track aggregation is a very simple operation with a finite number of possibilities. A complete description of these steps could enhance the reproducibility of retrospective studies.

SELECTION OF CITATIONS
SEARCH DETAIL