Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
1.
Pharmacoepidemiol Drug Saf ; 33(4): e5785, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38565526

RESUMO

INTRODUCTION: During the COVID-19 pandemic, inpatient electronic health records (EHRs) have been used to conduct public health surveillance and assess treatments and outcomes. Invasive mechanical ventilation (MV) and supplemental oxygen (O2) use are markers of severe illness in hospitalized COVID-19 patients. In a large US system (n = 142 hospitals), we assessed documentation of MV and O2 use during COVID-19 hospitalization in administrative data versus nursing documentation. METHODS: We identified 319 553 adult hospitalizations with a COVID-19 diagnosis, February 2020-October 2022, and extracted coded, administrative data for MV or O2. Separately, we developed classification rules for MV or O2 supplementation from semi-structured nursing documentation. We assessed MV and O2 supplementation in administrative data versus nursing documentation and calculated ordinal endpoints of decreasing COVID-19 disease severity. Nursing documentation was considered the gold standard in sensitivity and positive predictive value (PPV) analyses. RESULTS: In nursing documentation, the prevalence of MV and O2 supplementation among COVID-19 hospitalizations was 14% and 75%, respectively. The sensitivity of administrative data was 83% for MV and 41% for O2, with both PPVs above 91%. Concordance between sources was 97% for MV (κ = 0.85), and 54% for O2 (κ = 0.21). For ordinal endpoints, administrative data accurately identified intensive care and MV but underestimated hospitalizations with O2 requirements (42% vs. 18%). CONCLUSIONS: In comparison to nursing documentation, administrative data under-ascertained O2 supplementation but accurately estimated severe endpoints such as MV. Nursing documentation improved ascertainment of O2 among COVID-19 hospitalizations and can capture oxygen requirements in adults hospitalized with COVID-19 or other respiratory illnesses.


Assuntos
COVID-19 , Adulto , Humanos , Estados Unidos/epidemiologia , COVID-19/epidemiologia , Registros Eletrônicos de Saúde , Pacientes Internados , Pandemias , Teste para COVID-19 , Oxigênio
2.
BMC Med Res Methodol ; 23(1): 46, 2023 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-36800930

RESUMO

BACKGROUND: Multi-institution electronic health records (EHR) are a rich source of real world data (RWD) for generating real world evidence (RWE) regarding the utilization, benefits and harms of medical interventions. They provide access to clinical data from large pooled patient populations in addition to laboratory measurements unavailable in insurance claims-based data. However, secondary use of these data for research requires specialized knowledge and careful evaluation of data quality and completeness. We discuss data quality assessments undertaken during the conduct of prep-to-research, focusing on the investigation of treatment safety and effectiveness. METHODS: Using the National COVID Cohort Collaborative (N3C) enclave, we defined a patient population using criteria typical in non-interventional inpatient drug effectiveness studies. We present the challenges encountered when constructing this dataset, beginning with an examination of data quality across data partners. We then discuss the methods and best practices used to operationalize several important study elements: exposure to treatment, baseline health comorbidities, and key outcomes of interest. RESULTS: We share our experiences and lessons learned when working with heterogeneous EHR data from over 65 healthcare institutions and 4 common data models. We discuss six key areas of data variability and quality. (1) The specific EHR data elements captured from a site can vary depending on source data model and practice. (2) Data missingness remains a significant issue. (3) Drug exposures can be recorded at different levels and may not contain route of administration or dosage information. (4) Reconstruction of continuous drug exposure intervals may not always be possible. (5) EHR discontinuity is a major concern for capturing history of prior treatment and comorbidities. Lastly, (6) access to EHR data alone limits the potential outcomes which can be used in studies. CONCLUSIONS: The creation of large scale centralized multi-site EHR databases such as N3C enables a wide range of research aimed at better understanding treatments and health impacts of many conditions including COVID-19. As with all observational research, it is important that research teams engage with appropriate domain experts to understand the data in order to define research questions that are both clinically important and feasible to address using these real world data.


Assuntos
COVID-19 , Humanos , Confiabilidade dos Dados , Tratamento Farmacológico da COVID-19 , Coleta de Dados
3.
J Biomed Inform ; 143: 104403, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37230406

RESUMO

With the growth of data and intelligent technologies, the healthcare sector opened numerous technology that enabled services for patients, clinicians, and researchers. One major hurdle in achieving state-of-the-art results in health informatics is domain-specific terminologies and their semantic complexities. A knowledge graph crafted from medical concepts, events, and relationships acts as a medical semantic network to extract new links and hidden patterns from health data sources. Current medical knowledge graph construction studies are limited to generic techniques and opportunities and focus less on exploiting real-world data sources in knowledge graph construction. A knowledge graph constructed from Electronic Health Records (EHR) data obtains real-world data from healthcare records. It ensures better results in subsequent tasks like knowledge extraction and inference, knowledge graph completion, and medical knowledge graph applications such as diagnosis predictions, clinical recommendations, and clinical decision support. This review critically analyses existing works on medical knowledge graphs that used EHR data as the data source at (i) representation level, (ii) extraction level (iii) completion level. In this investigation, we found that EHR-based knowledge graph construction involves challenges such as high complexity and dimensionality of data, lack of knowledge fusion, and dynamic update of the knowledge graph. In addition, the study presents possible ways to tackle the challenges identified. Our findings conclude that future research should focus on knowledge graph integration and knowledge graph completion challenges.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Registros Eletrônicos de Saúde , Humanos , Reconhecimento Automatizado de Padrão , Bases de Conhecimento , Atenção à Saúde
4.
J Biomed Inform ; 139: 104239, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36356933

RESUMO

Deep learning methods have achieved success in disease prediction using electronic health records (EHR) data. Most of the existing methods have some limitations. First, most of the methods adopt a homogeneous decay way to deal with the effect of time interval on patient's previous visits information. However, the effect of the time interval between patient's visits is not always negative. For example, although the time interval between visits for patients with chronic diseases is relatively long, the importance of the previous visit to the next visit is high, and we may not be able to consider the effect of the time interval as negative at this point. That is, the effect of the time interval on previous visits is exerted in a nonmonotonic manner, and it is either positive, negative, or neutral. In addition, the effect of text information on prediction results is not taken into account in most of methods. The text in EHR contains a description of the patient's past medical history and current symptoms of the disease, which is important for prediction results. In order to solve these issues, we propose a Time Interval Uncertainty-Aware and Text-Enhanced Based Disease Prediction Model, which utilizes the uncertain effects of time intervals and patient's text information for disease prediction. Firstly, we apply a cross-attention mechanism to generate a global representation of the patient using the patient's disease and text information from the EHR. Then, we use the key-query attention mechanism to obtain the two importance weights of the two visit sequences with and without time intervals, respectively. Furthermore, we achieve disease prediction by making slight adjustments to the encode part of the Transformer, a deep learning model based on a self-attention mechanism. We compare with various state-of-the-art models on two publicly available datasets, MIMIC-III and MIMIC-IV, and select the top 10 diseases with the highest frequency in the dataset as the target diseases. On the MIMIC-III dataset, our model is up to three percent higher than the optimal baseline in terms of evaluation metrics.


Assuntos
Registros Eletrônicos de Saúde , Humanos , Incerteza
5.
J Biomed Inform ; 144: 104390, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37182592

RESUMO

Recent work has shown that predictive models can be applied to structured electronic health record (EHR) data to stratify autism likelihood from an early age (<1 year). Integrating clinical narratives (or notes) with structured data has been shown to improve prediction performance in other clinical applications, but the added predictive value of this information in early autism prediction has not yet been explored. In this study, we aimed to enhance the performance of early autism prediction by using both structured EHR data and clinical narratives. We built models based on structured data and clinical narratives separately, and then an ensemble model that integrated both sources of data. We assessed the predictive value of these models from Duke University Health System over a 14-year span to evaluate ensemble models predicting later autism diagnosis (by age 4 years) from data collected from ages 30 to 360 days. Our sample included 11,750 children above by age 3 years (385 meeting autism diagnostic criteria). The ensemble model for autism prediction showed superior performance and at age 30 days achieved 46.8% sensitivity (95% confidence interval, CI: 22.0%, 52.9%), 28.0% positive predictive value (PPV) at high (90%) specificity (CI: 2.0%, 33.1%), and AUC4 (with at least 4-year follow-up for controls) reaching 0.769 (CI: 0.715, 0.811). Prediction by 360 days achieved 44.5% sensitivity (CI: 23.6%, 62.9%), and 13.7% PPV at high (90%) specificity (CI: 9.6%, 18.9%), and AUC4 reaching 0.797 (CI: 0.746, 0.840). Results show that incorporating clinical narratives in early autism prediction achieved promising accuracy by age 30 days, outperforming models based on structured data only. Furthermore, findings suggest that additional features learned from clinician narratives might be hypothesis generating for understanding early development in autism.


Assuntos
Transtorno Autístico , Registros Eletrônicos de Saúde , Criança , Humanos , Lactente , Pré-Escolar , Transtorno Autístico/diagnóstico , Valor Preditivo dos Testes , Narração , Eletrônica
6.
J Asthma ; 60(8): 1573-1583, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-36562525

RESUMO

OBJECTIVE: Evaluate a nurse-initiated quality improvement (QI) intervention aimed at enhancing asthma treatment in a pediatric emergency department (ED), utilizing outcomes and workflow. METHODS: We evaluated the impact of QI interventions for pediatric patients presenting to the ED with asthma with pre-post analysis. A pediatric asthma score (PAS) of >8 indicated moderate to severe asthma. This secondary analysis of the electronic health record (EHR), evaluated on 1) patient outcomes (time to clinical treatment, ED length of stay [EDLOS], admissions and discharges home), 2) clinical workflow. RESULTS: We compared 886 visits occurring between 01/01/2015 and 09/27/2015 (pre-implementation period) with 752 visits between 01/01/2016 and 09/27/2016 (post-implementation). Time to first documentation of PAS was decreased post-intervention (p<.001) by >30 min (75 ± 57 to 39 ± 54 min). There were significant decreases in time to treatment with both steroid and bronchodilator administration (both p<.001). EDLOS did not significantly change. Based on acuity level, those discharged home from the ED with high acuity (PAS score ≥8), had a significant decrease in time to initial PAS, steroid and bronchodilator use and EDLOS. Of those with high acuity who were admitted to the hospital, there was a difference pre- to post-implementation, in time to first PAS (p<.05), but not to treatment. Workflow visualization provided additional insights and detailed (task level) comparisons of the timing of ED activities. CONCLUSIONS: Nurse-initiated ED interventions, can significantly improve the timeliness of pediatric asthma evaluation and treatment. Examining workflow along with the outcomes, can better inform QI evaluations and clinical management.


Assuntos
Asma , Humanos , Criança , Asma/tratamento farmacológico , Broncodilatadores/uso terapêutico , Melhoria de Qualidade , Fluxo de Trabalho , Serviço Hospitalar de Emergência
7.
BMC Med ; 20(1): 243, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35791013

RESUMO

BACKGROUND: While the vaccines against COVID-19 are highly effective, COVID-19 vaccine breakthrough is possible despite being fully vaccinated. With SARS-CoV-2 variants still circulating, describing the characteristics of individuals who have experienced COVID-19 vaccine breakthroughs could be hugely important in helping to determine who may be at greatest risk. METHODS: With the approval of NHS England, we conducted a retrospective cohort study using routine clinical data from the OpenSAFELY-TPP database of fully vaccinated individuals, linked to secondary care and death registry data and described the characteristics of those experiencing COVID-19 vaccine breakthroughs. RESULTS: As of 1st November 2021, a total of 15,501,550 individuals were identified as being fully vaccinated against COVID-19, with a median follow-up time of 149 days (IQR: ​107-179). From within this population, a total of 579,780 (<4%) individuals reported a positive SARS-CoV-2 test. For every 1000 years of patient follow-up time, the corresponding incidence rate (IR) was 98.06 (95% CI 97.93-98.19). There were 28,580 COVID-19-related hospital admissions, 1980 COVID-19-related critical care admissions and 6435 COVID-19-related deaths; corresponding IRs 4.77 (95% CI 4.74-4.80), 0.33 (95% CI 0.32-0.34) and 1.07 (95% CI 1.06-1.09), respectively. The highest rates of breakthrough COVID-19 were seen in those in care homes and in patients with chronic kidney disease, dialysis, transplant, haematological malignancy or who were immunocompromised. CONCLUSIONS: While the majority of COVID-19 vaccine breakthrough cases in England were mild, some differences in rates of breakthrough cases have been identified in several clinical groups. While it is important to note that these findings are simply descriptive and cannot be used to answer why certain groups have higher rates of COVID-19 breakthrough than others, the emergence of the Omicron variant of COVID-19 coupled with the number of positive SARS-CoV-2 tests still occurring is concerning and as numbers of fully vaccinated (and boosted) individuals increases and as follow-up time lengthens, so too will the number of COVID-19 breakthrough cases. Additional analyses, to assess vaccine waning and rates of breakthrough COVID-19 between different variants, aimed at identifying individuals at higher risk, are needed.


Assuntos
Vacinas contra COVID-19 , COVID-19 , COVID-19/epidemiologia , COVID-19/prevenção & controle , Vacina contra Varicela , Estudos de Coortes , Inglaterra/epidemiologia , Humanos , Estudos Retrospectivos , SARS-CoV-2 , Vacinação
8.
J Urban Health ; 99(6): 984-997, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36367672

RESUMO

There is tremendous interest in understanding how neighborhoods impact health by linking extant social and environmental drivers of health (SDOH) data with electronic health record (EHR) data. Studies quantifying such associations often use static neighborhood measures. Little research examines the impact of gentrification-a measure of neighborhood change-on the health of long-term neighborhood residents using EHR data, which may have a more generalizable population than traditional approaches. We quantified associations between gentrification and health and healthcare utilization by linking longitudinal socioeconomic data from the American Community Survey with EHR data across two health systems accessed by long-term residents of Durham County, NC, from 2007 to 2017. Census block group-level neighborhoods were eligible to be gentrified if they had low socioeconomic status relative to the county average. Gentrification was defined using socioeconomic data from 2006 to 2010 and 2011-2015, with the Steinmetz-Wood definition. Multivariable logistic and Poisson regression models estimated associations between gentrification and development of health indicators (cardiovascular disease, hypertension, diabetes, obesity, asthma, depression) or healthcare encounters (emergency department [ED], inpatient, or outpatient). Sensitivity analyses examined two alternative gentrification measures. Of the 99 block groups within the city of Durham, 28 were eligible (N = 10,807; median age = 42; 83% Black; 55% female) and 5 gentrified. Individuals in gentrifying neighborhoods had lower odds of obesity (odds ratio [OR] = 0.89; 95% confidence interval [CI]: 0.81-0.99), higher odds of an ED encounter (OR = 1.10; 95% CI: 1.01-1.20), and lower risk for outpatient encounters (incidence rate ratio = 0.93; 95% CI: 0.87-1.00) compared with non-gentrifying neighborhoods. The association between gentrification and health and healthcare utilization was sensitive to gentrification definition.


Assuntos
Características de Residência , Segregação Residencial , Humanos , Feminino , Adulto , Masculino , Aceitação pelo Paciente de Cuidados de Saúde , Razão de Chances , Obesidade
9.
J Biomed Inform ; 134: 104163, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36038064

RESUMO

We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.


Assuntos
Prestação Integrada de Cuidados de Saúde , Registros Eletrônicos de Saúde , Humanos , Modelos Estatísticos , Probabilidade
10.
J Biomed Inform ; 127: 104010, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35151869

RESUMO

Multimorbidity is a major factor contributing to increased mortality among people with severe mental illnesses (SMI). Previous studies either focus on estimating prevalence of a disease in a population without considering relationships between diseases or ignore heterogeneity of individual patients in examining disease progression by looking merely at aggregates across a whole cohort. Here, we present a temporal bipartite network model to jointly represent detailed information on both individual patients and diseases, which allows us to systematically characterize disease trajectories from both patient and disease centric perspectives. We apply this approach to a large set of longitudinal diagnostic records for patients with SMI collected through a data linkage between electronic health records from a large UK mental health hospital and English national hospital administrative database. We find that the resulting diagnosis networks show disassortative mixing by degree, suggesting that patients affected by a small number of diseases tend to suffer from prevalent diseases. Factors that determine the network structures include an individual's age, gender and ethnicity. Our analysis on network evolution further shows that patients and diseases become more interconnected over the illness duration of SMI, which is largely driven by the process that patients with similar attributes tend to suffer from the same conditions. Our analytic approach provides a guide for future patient-centric research on multimorbidity trajectories and contributes to achieving precision medicine.


Assuntos
Transtornos Mentais , Multimorbidade , Registros Eletrônicos de Saúde , Humanos , Transtornos Mentais/epidemiologia , Assistência Centrada no Paciente , Prevalência
11.
J Biomed Inform ; 115: 103686, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33493631

RESUMO

OBJECTIVE: As Electronic Health Records (EHR) data accumulated explosively in recent years, the tremendous amount of patient clinical data provided opportunities to discover real world evidence. In this study, a graphical disease network, named progressive cardiovascular disease network (progCDN), was built to delineate the progression profiles of cardiovascular diseases (CVD). MATERIALS AND METHODS: The EHR data of 14.3 million patients with CVD diagnoses were collected for building disease network and further analysis. We applied a new designed method, progression rates (PR), to calculate the progression relationship among different diagnoses. Based on the disease network outcome, 23 disease progression pair were selected to screen for salient features. RESULTS: The network depicted the dominant diseases in CVD development, such as the heart failure and coronary arteriosclerosis. Novel progression relationships were also discovered, such as the progression path from long QT syndrome to major depression. In addition, three age-group progCDNs identified a series of age-associated disease progression paths and important successor diseases with age bias. Furthermore, a list of important features with sufficient abundance and high correlation was extracted for building disease risk models. DISCUSSION: The PR method designed for identifying the progression relationship could be widely applied in any EHR database due to its flexibility and robust functionality. Meanwhile, researchers could use the progCDN network to validate or explore novel disease relationships in real world data. CONCLUSION: The first-time interrogation of such a huge CVD patients cohort enabled us to explore the general and age-specific disease progression patterns in CVD development.


Assuntos
Doenças Cardiovasculares , Doenças Cardiovasculares/diagnóstico , Estudos de Coortes , Bases de Dados Factuais , Progressão da Doença , Registros Eletrônicos de Saúde , Humanos
12.
J Biomed Inform ; 113: 103667, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33359112

RESUMO

Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate the design of preemptive interventions. For example, specific temporal patterns could be used to recognize low-prevalence diseases, which are often under-diagnosed. However, searching these patterns in temporal medical data is challenging, as the data are often noisy, complex, and large in scale. In this work, we propose an effective and efficient solution to search for patients who exhibit conditions that resemble the input query. In our solution, we propose a similarity notion based on the Longest Common Subsequence (LCSS), which is used to measure the similarity between the query and the patient's temporal medical data and to ensure robustness against noise in the data. Our solution adopts locality sensitive hashing techniques to address the high dimensionality of medical data, by embedding the recorded clinical events (e.g., medications and diagnosis codes) into compact signatures. To perform pattern search in large EHR datasets, we propose a filtering approach based on tandem patterns, which effectively identifies candidate matches while discarding irrelevant data. The evaluations conducted using a real-world dataset demonstrate that our solution is highly accurate while significantly accelerating the similarity search.

13.
BMC Med Inform Decis Mak ; 21(Suppl 2): 53, 2021 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-34330258

RESUMO

BACKGROUND: Given that China has encouraged EHR usage in hospitals for more than a decade, patients' access to their own EHR data is still not as widely utilized as expected. METHODS: We cultivated a survey with four categories and field interviews of measures to identify whether hospitals have already released EHR data to patients, inpatients or outpatients, the top EHR release contents and the most popular release software. RESULTS: Of the 1344 responding hospitals from 30 provinces nationwide, 41.37% of hospitals have already released their EHR data to patients, of which 97.12% are through smart apps. More than 91% of hospitals use WeChat, and 32.37% of hospitals developed their own standalone apps or use vendors' apps. A total of 54.63% were released to both outpatients and inpatients, and the top release contents were all objective. A rough estimation is made that releasing EHR data to patients via smart apps may save the hospital 15.9 million RMB per year and patients 9.4 million RMB altogether. CONCLUSIONS: EHR data release is believed to bring both patient and hospital cost savings and efficiency gains but is still considered spontaneous and requires legal support and government regulation.


Assuntos
Registros Eletrônicos de Saúde , Software , China , Hospitais , Humanos
14.
J Biomed Inform ; 110: 103531, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32818667

RESUMO

This paper considers the problems of modeling and predicting a long-term and "blurry" relapse that occurs after a medical act, such as a surgery. We do not consider a short-term complication related to the act itself, but a long-term relapse that clinicians cannot explain easily, since it depends on unknown sets or sequences of past events that occurred before the act. The relapse is observed only indirectly, in a "blurry" fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a "non-clinical" claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Our motivation for using such a non-clinical claims database is its exhaustivity population-wise, compared to clinical electronic health records coming from a single or a small set of hospitals. Indeed, we consider a dataset containing the claims of almost all French citizens who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work.


Assuntos
Aprendizado Profundo , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Humanos , Recidiva
16.
J Biomed Inform ; 92: 103138, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30825539

RESUMO

Electronic health record (EHR) data provide promising opportunities to explore personalized treatment regimes and to make clinical predictions. Compared with regular clinical data, EHR data are known for their irregularity and complexity. In addition, analyzing EHR data involves privacy issues and sharing such data is often infeasible among multiple research sites due to regulatory and other hurdles. A recently published work uses contextual embedding models and successfully builds one predictive model for more than seventy common diagnoses. Despite of the high predictive power, the model cannot be generalized to other institutions without sharing data. In this work, a novel method is proposed to learn from multiple databases and build predictive models based on Distributed Noise Contrastive Estimation (Distributed NCE). We use differential privacy to safeguard the intermediary information sharing. The numerical study with a real dataset demonstrates that the proposed method not only can build predictive models in a distributed manner with privacy protection, but also preserve model structure well and achieve comparable prediction accuracy. The proposed methods have been implemented as a stand-alone Python library and the implementation is available on Github (https://github.com/ziyili20/DistributedLearningPredictor) with installation instructions and use-cases.


Assuntos
Redes de Comunicação de Computadores , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Diagnóstico por Computador , Humanos
17.
J Biomed Inform ; 100S: 100004, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-34384582

RESUMO

The pursuit of increased efficiency and quality of clinical care based on the analysis of workflow has seen the introduction of several modern technologies into medical environments. Electronic health records (EHRs) remain central to analysis of workflow, owing to their wide-ranging impact on clinical processes. The two most common interventions to facilitate EHR-related workflow analysis are automated location tracking using sensor-based technologies and EHR usage data logs. However, to maximize the potential of these technologies, and especially to facilitate workflow redesign, it is necessary to overlay these quantitative findings on the contextual data from qualitative methods such as ethnography. Such a complementary approach promises to yield more precise measures of clinical workflow that provide insights into how redesign could address inefficiencies. In this paper, we categorize clinical workflow in the Emergency Department (ED) into three types (perceived, real and ideal) to create a structured approach to workflow redesign using the available data. We use diverse data sources: sensor-based location tracking through Radio-Frequency Identification (RFID), summary EHR usage data logs, and data from physician interviews augmented by direct observations (through clinician shadowing). Our goal is to discover inefficiencies and bottlenecks that can be addressed to achieve a more ideal workflow state relative to its real and perceived state. We thereby seek to demonstrate a novel data-driven approach toward iterative workflow redesign that generalizes for use in a variety of settings. We also propose types of targeted support or adjustments to offset some of the inefficiencies we noted.

18.
J Biomed Inform ; 78: 43-53, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29277597

RESUMO

Modern medical information systems enable the collection of massive temporal health data. Albeit these data have great potentials for advancing medical research, the data exploration and extraction of useful knowledge present significant challenges. In this work, we develop a new pattern matching technique which aims to facilitate the discovery of clinically useful knowledge from large temporal datasets. Our approach receives in input a set of temporal patterns modeling specific events of interest (e.g., doctor's knowledge, symptoms of diseases) and it returns data instances matching these patterns (e.g., patients exhibiting the specified symptoms). The resulting instances are ranked according to a significance score based on the p-value. Our experimental evaluations on a real-world dataset demonstrate the efficiency and effectiveness of our approach.


Assuntos
Mineração de Dados/métodos , Registros Eletrônicos de Saúde/classificação , Pacientes/classificação , Reconhecimento Automatizado de Padrão/métodos , Curadoria de Dados , Bases de Dados Factuais , Atenção à Saúde , Humanos , Fatores de Tempo
19.
J Biomed Inform ; 69: 86-92, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28389234

RESUMO

Annotating unstructured texts in Electronic Health Records data is usually a necessary step for conducting machine learning research on such datasets. Manual annotation by domain experts provides data of the best quality, but has become increasingly impractical given the rapid increase in the volume of EHR data. In this article, we examine the effectiveness of crowdsourcing with unscreened online workers as an alternative for transforming unstructured texts in EHRs into annotated data that are directly usable in supervised learning models. We find the crowdsourced annotation data to be just as effective as expert data in training a sentence classification model to detect the mentioning of abnormal ear anatomy in radiology reports of audiology. Furthermore, we have discovered that enabling workers to self-report a confidence level associated with each annotation can help researchers pinpoint less-accurate annotations requiring expert scrutiny. Our findings suggest that even crowd workers without specific domain knowledge can contribute effectively to the task of annotating unstructured EHR datasets.


Assuntos
Crowdsourcing , Curadoria de Dados , Registros Eletrônicos de Saúde , Audiologia , Humanos , Radiologia
20.
J Biomed Inform ; 63: 22-32, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-27444186

RESUMO

Information extraction from narrative clinical notes is useful for patient care, as well as for secondary use of medical data, for research or clinical purposes. Many studies focused on information extraction from English clinical texts, but less dealt with clinical notes in languages other than English. This study tested the feasibility of using "off the shelf" information extraction algorithms to identify medical concepts from Italian clinical notes. Among all the available and well-established information extraction algorithms, we used MetaMap to map medical concepts to the Unified Medical Language System (UMLS). The study addressed two questions: (Q1) to understand if it would be possible to properly map medical terms found in clinical notes and related to the semantic group of "Disorders" to the Italian UMLS resources; (Q2) to investigate if it would be feasible to use MetaMap as it is to extract these medical concepts from Italian clinical notes. We performed three experiments: in EXP1, we investigated how many medical concepts of the "Disorders" semantic group found in a set of clinical notes written in Italian could be mapped to the UMLS Italian medical sources; in EXP2 we assessed how the different processing steps used by MetaMap, which are English dependent, could be used in Italian texts to map the original clinical notes on the Italian UMLS sources; in EXP3 we automatically translated the clinical notes from Italian to English using Google Translator, and then we used MetaMap to map the translated texts. Results in EXP1 showed that the Italian UMLS Metathesaurus sources covered 91% of the medical terms of the "Disorders" semantic group, as found in the studied dataset. We observed that even if MetaMap was built to analyze texts written in English, most of its processing steps worked properly also with texts written in Italian. MetaMap identified correctly about half of the concepts in the Italian clinical notes. Using MetaMap's annotation on Italian clinical notes instead of a simple text search improved our results of about 15 percentage points. MetaMap's annotation of Italian clinical notes showed recall, precision and F-measure equal to 0.53, 0.98 and 0.69, respectively. Most of the failures were due to the impossibility for MetaMap to generate meaningful variants for the Italian language, suggesting that modifying MetaMap to allow generating Italian variants could improve the performance. MetaMap's performance in annotating automatically translated English clinical notes was in line with findings in the literature, with similar recall (0.75), F-measure (0.83) and even higher precision (0.95). Most of the failures were due to a bad Italian to English translation of medical terms, suggesting that using an automatic translation tool specialized in translating medical concepts might be useful to obtain better performances. In conclusion, performances obtained using MetaMap on the fully automatic translation of the Italian text are good enough to allow to use MetaMap "as it is" in clinical practice.


Assuntos
Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Unified Medical Language System , Algoritmos , Estudos de Viabilidade , Humanos , Itália
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA