Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 194
Filtrar
1.
J Med Internet Res ; 26: e47682, 2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38820575

RESUMO

The health sector is highly digitized, which is enabling the collection of vast quantities of electronic data about health and well-being. These data are collected by a diverse array of information and communication technologies, including systems used by health care organizations, consumer and community sources such as information collected on the web, and passively collected data from technologies such as wearables and devices. Understanding the breadth of IT that collect these data and how it can be actioned is a challenge for the significant portion of the digital health workforce that interact with health data as part of their duties but are not for informatics experts. This viewpoint aims to present a taxonomy categorizing common information and communication technologies that collect electronic data. An initial classification of key information systems collecting electronic health data was undertaken via a rapid review of the literature. Subsequently, a purposeful search of the scholarly and gray literature was undertaken to extract key information about the systems within each category to generate definitions of the systems and describe the strengths and limitations of these systems.


Assuntos
Sistemas de Informação em Saúde , Humanos , Registros Eletrônicos de Saúde/classificação
2.
Pharmacol Res Perspect ; 8(6): e00687, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33280248

RESUMO

Characterizing long-term prescription data is challenging due to the time-varying nature of drug use. Conventional approaches summarize time-varying data into categorical variables based on simple measures, such as cumulative dose, while ignoring patterns of use. The loss of information can lead to misclassification and biased estimates of the exposure-outcome association. We introduce a classification method to characterize longitudinal prescription data with an unsupervised machine learning algorithm. We used administrative databases covering virtually all 1.3 million residents of Manitoba and explicitly designed features to describe the average dose, proportion of days covered (PDC), dose change, and dose variability, and clustered the resulting feature space using K-means clustering. We applied this method to metformin use in diabetes patients. We identified 27,786 metformin users and showed that the feature distributions of their metformin use are stable for varying the lengths of follow-up and that these distributions have clear interpretations. We found six distinct metformin user groups: patients with intermittent use, decreasing dose, increasing dose, high dose, and two medium dose groups (one with stable dose and one with highly variable use). Patients in the varying and decreasing dose groups had a higher chance of progression of diabetes than other patients. The method presented in this paper allows for characterization of drug use into distinct and clinically relevant groups in a way that cannot be obtained from merely classifying use by quantiles of overall use.


Assuntos
Bases de Dados Factuais/classificação , Diabetes Mellitus/tratamento farmacológico , Diabetes Mellitus/epidemiologia , Registros Eletrônicos de Saúde/classificação , Hipoglicemiantes/uso terapêutico , Metformina/uso terapêutico , Adulto , Idoso , Algoritmos , Relação Dose-Resposta a Droga , Feminino , Seguimentos , Humanos , Masculino , Pessoa de Meia-Idade , Ontário/epidemiologia , Assistência de Saúde Universal
3.
J Am Med Inform Assoc ; 27(8): 1235-1243, 2020 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-32548637

RESUMO

OBJECTIVE: A major bottleneck hindering utilization of electronic health record data for translational research is the lack of precise phenotype labels. Chart review as well as rule-based and supervised phenotyping approaches require laborious expert input, hampering applicability to studies that require many phenotypes to be defined and labeled de novo. Though International Classification of Diseases codes are often used as surrogates for true labels in this setting, these sometimes suffer from poor specificity. We propose a fully automated topic modeling algorithm to simultaneously annotate multiple phenotypes. MATERIALS AND METHODS: Surrogate-guided ensemble latent Dirichlet allocation (sureLDA) is a label-free multidimensional phenotyping method. It first uses the PheNorm algorithm to initialize probabilities based on 2 surrogate features for each target phenotype, and then leverages these probabilities to constrain the LDA topic model to generate phenotype-specific topics. Finally, it combines phenotype-feature counts with surrogates via clustering ensemble to yield final phenotype probabilities. RESULTS: sureLDA achieves reliably high accuracy and precision across a range of simulated and real-world phenotypes. Its performance is robust to phenotype prevalence and relative informativeness of surogate vs nonsurrogate features. It also exhibits powerful feature selection properties. DISCUSSION: sureLDA combines attractive properties of PheNorm and LDA to achieve high accuracy and precision robust to diverse phenotype characteristics. It offers particular improvement for phenotypes insufficiently captured by a few surrogate features. Moreover, sureLDA's feature selection ability enables it to handle high feature dimensions and produce interpretable computational phenotypes. CONCLUSIONS: sureLDA is well suited toward large-scale electronic health record phenotyping for highly multiphenotype applications such as phenome-wide association studies .


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde/classificação , Humanos , Medicina de Precisão , Curva ROC , Pesquisa Translacional Biomédica
5.
Nat Commun ; 11(1): 2536, 2020 05 21.
Artigo em Inglês | MEDLINE | ID: mdl-32439869

RESUMO

Electronic health records (EHR) are rich heterogeneous collections of patient health information, whose broad adoption provides clinicians and researchers unprecedented opportunities for health informatics, disease-risk prediction, actionable clinical recommendations, and precision medicine. However, EHRs present several modeling challenges, including highly sparse data matrices, noisy irregular clinical notes, arbitrary biases in billing code assignment, diagnosis-driven lab tests, and heterogeneous data types. To address these challenges, we present MixEHR, a multi-view Bayesian topic model. We demonstrate MixEHR on MIMIC-III, Mayo Clinic Bipolar Disorder, and Quebec Congenital Heart Disease EHR datasets. Qualitatively, MixEHR disease topics reveal meaningful combinations of clinical features across heterogeneous data types. Quantitatively, we observe superior prediction accuracy of diagnostic codes and lab test imputations compared to the state-of-art methods. We leverage the inferred patient topic mixtures to classify target diseases and predict mortality of patients in critical conditions. In all comparison, MixEHR confers competitive performance and reveals meaningful disease-related topics.


Assuntos
Registros Eletrônicos de Saúde/classificação , Informática Médica/métodos , Teorema de Bayes , Bases de Dados Factuais , Registros Eletrônicos de Saúde/estatística & dados numéricos , Humanos , Aprendizado de Máquina , Modelos Estatísticos , Fenótipo
6.
PLoS One ; 15(5): e0232840, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32396579

RESUMO

Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence-for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks-site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.


Assuntos
Registros Eletrônicos de Saúde/classificação , Neoplasias/patologia , Técnicas Histológicas , Humanos , Processamento de Linguagem Natural , Programa de SEER
7.
J Am Med Inform Assoc ; 27(6): 877-883, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32374408

RESUMO

OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network. MATERIALS AND METHODS: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site. RESULTS: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site. DISCUSSION AND CONCLUSION: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research.


Assuntos
Registros Eletrônicos de Saúde/classificação , Informática Médica , Aprendizado de Máquina Supervisionado , Classificação/métodos , Ciência de Dados , Humanos , Estudos Observacionais como Assunto
8.
Anesthesiology ; 132(4): 738-749, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32028374

RESUMO

BACKGROUND: Accurate anesthesiology procedure code data are essential to quality improvement, research, and reimbursement tasks within anesthesiology practices. Advanced data science techniques, including machine learning and natural language processing, offer opportunities to develop classification tools for Current Procedural Terminology codes across anesthesia procedures. METHODS: Models were created using a Train/Test dataset including 1,164,343 procedures from 16 academic and private hospitals. Five supervised machine learning models were created to classify anesthesiology Current Procedural Terminology codes, with accuracy defined as first choice classification matching the institutional-assigned code existing in the perioperative database. The two best performing models were further refined and tested on a Holdout dataset from a single institution distinct from Train/Test. A tunable confidence parameter was created to identify cases for which models were highly accurate, with the goal of at least 95% accuracy, above the reported 2018 Centers for Medicare and Medicaid Services (Baltimore, Maryland) fee-for-service accuracy. Actual submitted claim data from billing specialists were used as a reference standard. RESULTS: Support vector machine and neural network label-embedding attentive models were the best performing models, respectively, demonstrating overall accuracies of 87.9% and 84.2% (single best code), and 96.8% and 94.0% (within top three). Classification accuracy was 96.4% in 47.0% of cases using support vector machine and 94.4% in 62.2% of cases using label-embedding attentive model within the Train/Test dataset. In the Holdout dataset, respective classification accuracies were 93.1% in 58.0% of cases and 95.0% among 62.0%. The most important feature in model training was procedure text. CONCLUSIONS: Through application of machine learning and natural language processing techniques, highly accurate real-time models were created for anesthesiology Current Procedural Terminology code classification. The increased processing speed and a priori targeted accuracy of this classification approach may provide performance optimization and cost reduction for quality improvement, research, and reimbursement tasks reliant on anesthesiology procedure codes.


Assuntos
Current Procedural Terminology , Bases de Dados Factuais/classificação , Registros Eletrônicos de Saúde/classificação , Aprendizado de Máquina/classificação , Redes Neurais de Computação , Adolescente , Adulto , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
9.
AMIA Annu Symp Proc ; 2020: 273-282, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33936399

RESUMO

Research has demonstrated cohort misclassification when studies of suicidal thoughts and behaviors (STBs) rely on ICD-9/10-CM diagnosis codes. Electronic health record (EHR) data are being explored to better identify patients, a process called EHR phenotyping. Most STB phenotyping studies have used structured EHR data, but some are beginning to incorporate unstructured clinical text. In this study, we used a publicly-accessible natural language processing (NLP) program for biomedical text (MetaMap) and iterative elastic net regression to extract and select predictive text features from the discharge summaries of 810 inpatient admissions of interest. Initial sets of 5,866 and 2,709 text features were reduced to 18 and 11, respectively. The two models fit with these features obtained an area under the receiver operating characteristic curve of 0.866-0.895 and an area under the precision-recall curve of 0.800-0.838, demonstrating the approach's potential to identify textual features to incorporate in phenotyping models.


Assuntos
Algoritmos , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/classificação , Processamento de Linguagem Natural , Tentativa de Suicídio/classificação , Estudos de Coortes , Feminino , Humanos , Classificação Internacional de Doenças , Aprendizado de Máquina , Masculino , Fenótipo , Prevalência , Curva ROC
10.
Neural Netw ; 121: 132-139, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31541881

RESUMO

Neural networks (NNs) have become the state of the art in many machine learning applications, such as image, sound (LeCun et al., 2015) and natural language processing (Young et al., 2017; Linggard et al., 2012). However, the success of NNs remains dependent on the availability of large labelled datasets, such as in the case of electronic health records (EHRs). With scarce data, NNs are unlikely to be able to extract this hidden information with practical accuracy. In this study, we develop an approach that solves these problems for named entity recognition, obtaining 94.6 F1 score in I2B2 2009 Medical Extraction Challenge (Uzuner et al., 2010), 4.3 above the architecture that won the competition. To achieve this, we bootstrap our NN models through transfer learning by pretraining word embeddings on a secondary task performed on a large pool of unannotated EHRs and using the output embeddings as a foundation of a range of NN architectures. Beyond the official I2B2 challenge, we further achieve 82.4 F1 on extracting relationships between medical terms using attention-based seq2seq models bootstrapped in the same manner.


Assuntos
Registros Eletrônicos de Saúde/classificação , Aprendizado de Máquina/classificação , Processamento de Linguagem Natural , Redes Neurais de Computação , Coleta de Dados/classificação , Coleta de Dados/métodos , Humanos
11.
IEEE Trans Cybern ; 50(2): 536-549, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-30273180

RESUMO

Many real-world optimization problems can be solved by using the data-driven approach only, simply because no analytic objective functions are available for evaluating candidate solutions. In this paper, we address a class of expensive data-driven constrained multiobjective combinatorial optimization problems, where the objectives and constraints can be calculated only on the basis of a large amount of data. To solve this class of problems, we propose using random forests (RFs) and radial basis function networks as surrogates to approximate both objective and constraint functions. In addition, logistic regression models are introduced to rectify the surrogate-assisted fitness evaluations and a stochastic ranking selection is adopted to further reduce the influences of the approximated constraint functions. Three variants of the proposed algorithm are empirically evaluated on multiobjective knapsack benchmark problems and two real-world trauma system design problems. Experimental results demonstrate that the variant using RF models as the surrogates is effective and efficient in solving data-driven constrained multiobjective combinatorial optimization problems.


Assuntos
Algoritmos , Árvores de Decisões , Aprendizado de Máquina , Registros Eletrônicos de Saúde/classificação , Humanos , Ferimentos e Lesões/classificação
12.
Comput Methods Programs Biomed ; 188: 105264, 2020 May.
Artigo em Inglês | MEDLINE | ID: mdl-31851906

RESUMO

BACKGROUND AND OBJECTIVE: This work deals with clinical text mining, a field of Natural Language Processing applied to biomedical informatics. The aim is to classify Electronic Health Records with respect to the International Classification of Diseases, which is the foundation for the identification of international health statistics, and the standard for reporting diseases and health conditions. Within the framework of data mining, the goal is the multi-label classification, as each health record has assigned multiple International Classification of Diseases codes. We investigate five Deep Learning architectures with a dataset obtained from the Basque Country Health System, and six different perspectives derived from shifts in the input and the output. METHODS: We evaluate a Feed Forward Neural Network as the baseline and several Recurrent models based on the Bidirectional GRU architecture, putting our research focus on the text representation layer and testing three variants, from standard word embeddings to meta word embeddings techniques and contextual embeddings. RESULTS: The results showed that the recurrent models overcome the non-recurrent model. The meta word embeddings techniques are capable of beating the standard word embeddings, but the contextual embeddings exhibit as the most robust for the downstream task overall. Additionally, the label-granularity alone has an impact on the classification performance. CONCLUSIONS: The contributions of this work are a) a comparison among five classification approaches based on Deep Learning on a Spanish dataset to cope with the multi-label health text classification problem; b) the study of the impact of document length and label-set size and granularity in the multi-label context; and c) the study of measures to mitigate multi-label text classification problems related to label-set size and sparseness.


Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde/classificação , Informática Médica , Reconhecimento Automatizado de Padrão , Algoritmos , Gráficos por Computador , Mineração de Dados , Humanos , Classificação Internacional de Doenças , Processamento de Linguagem Natural , Redes Neurais de Computação , Software , Espanha
13.
J Am Med Inform Assoc ; 27(1): 119-126, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31722396

RESUMO

OBJECTIVE: Phenotyping patients using electronic health record (EHR) data conventionally requires labeled cases and controls. Assigning labels requires manual medical chart review and therefore is labor intensive. For some phenotypes, identifying gold-standard controls is prohibitive. We developed an accurate EHR phenotyping approach that does not require labeled controls. MATERIALS AND METHODS: Our framework relies on a random subset of cases, which can be specified using an anchor variable that has excellent positive predictive value and sensitivity independent of predictors. We proposed a maximum likelihood approach that efficiently leverages data from the specified cases and unlabeled patients to develop logistic regression phenotyping models, and compare model performance with existing algorithms. RESULTS: Our method outperformed the existing algorithms on predictive accuracy in Monte Carlo simulation studies, application to identify hypertension patients with hypokalemia requiring oral supplementation using a simulated anchor, and application to identify primary aldosteronism patients using real-world cases and anchor variables. Our method additionally generated consistent estimates of 2 important parameters, phenotype prevalence and the proportion of true cases that are labeled. DISCUSSION: Upon identification of an anchor variable that is scalable and transferable to different practices, our approach should facilitate development of scalable, transferable, and practice-specific phenotyping models. CONCLUSIONS: Our proposed approach enables accurate semiautomated EHR phenotyping with minimal manual labeling and therefore should greatly facilitate EHR clinical decision support and research.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde/classificação , Funções Verossimilhança , Humanos , Método de Monte Carlo
14.
J Am Med Inform Assoc ; 27(2): 244-253, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31617899

RESUMO

OBJECTIVES: The ability to identify novel risk factors for health outcomes is a key strength of electronic health record (EHR)-based research. However, the validity of such studies is limited by error in EHR-derived phenotypes. The objective of this study was to develop a novel procedure for reducing bias in estimated associations between risk factors and phenotypes in EHR data. MATERIALS AND METHODS: The proposed method combines the strengths of a gold-standard phenotype obtained through manual chart review for a small validation set of patients and an automatically-derived phenotype that is available for all patients but is potentially error-prone (hereafter referred to as the algorithm-derived phenotype). An augmented estimator of associations is obtained by optimally combining these 2 phenotypes. We conducted simulation studies to evaluate the performance of the augmented estimator and conducted an analysis of risk factors for second breast cancer events using data on a cohort from Kaiser Permanente Washington. RESULTS: The proposed method was shown to reduce bias relative to an estimator using only the algorithm-derived phenotype and reduce variance compared to an estimator using only the validation data. DISCUSSION: Our simulation studies and real data application demonstrate that, compared to the estimator using validation data only, the augmented estimator has lower variance (ie, higher statistical efficiency). Compared to the estimator using error-prone EHR-derived phenotypes, the augmented estimator has smaller bias. CONCLUSIONS: The proposed estimator can effectively combine an error-prone phenotype with gold-standard data from a limited chart review in order to improve analyses of risk factors using EHR data.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde/classificação , Viés , Data Warehousing , Humanos
15.
BMJ Health Care Inform ; 26(1)2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31848142

RESUMO

OBJECTIVE: Long problem lists can be challenging to use. Reorganisation of the problem list by organ system is a strategy for making long problem lists more manageable. METHODS: In a small-town primary care setting, we examined 4950 unique problem lists over 5 years (24 033 total problems and 2170 unique problems) from our electronic health record. All problems were mapped to the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) and SNOMED CT codes. We developed two different algorithms for reorganising the problem list by organ system based on either the ICD-10-CM or the SNOMED CT code. RESULTS: The mean problem list length was 4.9±4.6 problems. The two reorganisation algorithms allocated problems to one of 15 different categories (12 aligning with organ systems). 26.2% of problems were assigned to a more general category of 'signs and symptoms' that did not correspond to a single organ system. The two algorithms were concordant in allocation by organ system for 90% of the unique problems. Since ICD-10-CM is a monohierarchic classification system, problems coded by ICD-10-CM were assigned to a single category. Since SNOMED CT is a polyhierarchical ontology, 19.4% of problems coded by SNOMED CT were assigned to multiple categories. CONCLUSION: Reorganisation of the problem list by organ system is feasible using algorithms based on either ICD-10-CM or SNOMED CT codes, and the two algorithms are highly concordant.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde/classificação , Registros Eletrônicos de Saúde/normas , Gestão da Informação em Saúde , Humanos , Classificação Internacional de Doenças , Atenção Primária à Saúde , Systematized Nomenclature of Medicine
16.
BMC Med Inform Decis Mak ; 19(Suppl 6): 263, 2019 12 19.
Artigo em Inglês | MEDLINE | ID: mdl-31856819

RESUMO

BACKGROUND: Sequence alignment is a way of arranging sequences (e.g., DNA, RNA, protein, natural language, financial data, or medical events) to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients. METHODS: We tested two cutting-edge global sequence alignment methods, namely dynamic time warping (DTW) and Needleman-Wunsch algorithm (NWA), together with their local modifications, DTW for Local alignment (DTWL) and Smith-Waterman algorithm (SWA), for aligning patient medical records. We also used 4 sets of synthetic patient medical records generated from a large real-world EHR database as gold standard data, to objectively evaluate these sequence alignment algorithms. RESULTS: For global sequence alignments, 47 out of 80 DTW alignments and 11 out of 80 NWA alignments had superior similarity scores than reference alignments while the rest 33 DTW alignments and 69 NWA alignments had the same similarity scores as reference alignments. Forty-six out of 80 DTW alignments had better similarity scores than NWA alignments with the rest 34 cases having the equal similarity scores from both algorithms. For local sequence alignments, 70 out of 80 DTWL alignments and 68 out of 80 SWA alignments had larger coverage and higher similarity scores than reference alignments while the rest DTWL alignments and SWA alignments received the same coverage and similarity scores as reference alignments. Six out of 80 DTWL alignments showed larger coverage and higher similarity scores than SWA alignments. Thirty DTWL alignments had the equal coverage but better similarity scores than SWA. DTWL and SWA received the equal coverage and similarity scores for the rest 44 cases. CONCLUSIONS: DTW, NWA, DTWL and SWA outperformed the reference alignments. DTW (or DTWL) seems to align better than NWA (or SWA) by inserting new daily events and identifying more similarities between patient medical records. The evaluation results could provide valuable information on the strengths and weakness of these sequence alignment methods for future development of sequence alignment methods and patient similarity-based studies.


Assuntos
Comparação Transcultural , Registros Eletrônicos de Saúde/estatística & dados numéricos , Alinhamento de Sequência , Algoritmos , Diagnóstico , Registros Eletrônicos de Saúde/classificação , Humanos , Prognóstico , Terapêutica
17.
J Biomed Inform ; 99: 103310, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31622801

RESUMO

BACKGROUND: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). METHODS: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources - Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. RESULTS: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals' judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. CONCLUSIONS: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.


Assuntos
Registros Eletrônicos de Saúde/classificação , Interoperabilidade da Informação em Saúde , Obesidade/epidemiologia , Alta do Paciente , Adulto , Algoritmos , Índice de Massa Corporal , Comorbidade , Feminino , Humanos , Aprendizado de Máquina , Masculino , Fenótipo , Software
18.
J Biomed Inform ; 99: 103285, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31546016

RESUMO

This work presents a two-stage deep learning system for Named Entity Recognition (NER) and Relation Extraction (RE) from medical texts. These tasks are a crucial step to many natural language understanding applications in the biomedical domain. Automatic medical coding of electronic medical records, automated summarizing of patient records, automatic cohort identification for clinical studies, text simplification of health documents for patients, early detection of adverse drug reactions or automatic identification of risk factors are only a few examples of the many possible opportunities that the text analysis can offer in the clinical domain. In this work, our efforts are primarily directed towards the improvement of the pharmacovigilance process by the automatic detection of drug-drug interactions (DDI) from texts. Moreover, we deal with the semantic analysis of texts containing health information for patients. Our two-stage approach is based on Deep Learning architectures. Concretely, NER is performed combining a bidirectional Long Short-Term Memory (Bi-LSTM) and a Conditional Random Field (CRF), while RE applies a Convolutional Neural Network (CNN). Since our approach uses very few language resources, only the pre-trained word embeddings, and does not exploit any domain resources (such as dictionaries or ontologies), this can be easily expandable to support other languages and clinical applications that require the exploitation of semantic information (concepts and relationships) from texts. During the last years, the task of DDI extraction has received great attention by the BioNLP community. However, the problem has been traditionally evaluated as two separate subtasks: drug name recognition and extraction of DDIs. To the best of our knowledge, this is the first work that provides an evaluation of the whole pipeline. Moreover, our system obtains state-of-the-art results on the eHealth-KD challenge, which was part of the Workshop on Semantic Analysis at SEPLN (TASS-2018).


Assuntos
Mineração de Dados/métodos , Aprendizado Profundo , Registros Eletrônicos de Saúde/classificação , Codificação Clínica , Interações Medicamentosas , Humanos
19.
J Biomed Inform ; 99: 103293, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31542521

RESUMO

BACKGROUND: Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms. METHODS: We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category. RESULTS: A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37 ±â€¯1.38. Specifically, the average knowledge (K) score is 0.64 ±â€¯0.66, interpretation (I) score is 0.33 ±â€¯0.55, and programming (P) score is 0.40 ±â€¯0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks. CONCLUSION: This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.


Assuntos
Registros Eletrônicos de Saúde/classificação , Informática Médica/métodos , Algoritmos , Genômica , Humanos , Fenótipo , Estudos Retrospectivos
20.
Int J Risk Saf Med ; 30(3): 129-153, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31476171

RESUMO

OBJECTIVE: To compare primary medical adverse event keywords from reporters (e.g. physicians and nurses) and harm level perspectives to explore the underlying behaviors of medical adverse events using social network analysis (SNA) and latent Dirichlet allocation (LDA) leading to process improvements. DESIGN: Used SNA methods to explore primary keywords used to describe the medical adverse events reported by physicians and nurses. Used LDA methods to investigate topics used for various harm levels. Combined the SNA and LDA methods to discover common shared topic keywords to better understand underlying behaviors of physicians and nurses in different harm level medical adverse events. SETTING: Maccabi Healthcare Community is the second largest healthcare organization in Israel. DATA: 17,868 medical adverse event data records collected between 2000 and 2017. METHODS: Big data analysis techniques using social network analysis (SNA) and latent Dirichlet allocation (LDA). RESULTS: Shared topic keywords used by both physicians and nurses were determined. The study revealed that communication, information transfer, and inattentiveness were the most common problems reported in the medical adverse events data. CONCLUSIONS: Communication and inattentiveness were the most common problems reported in medical adverse events regardless of healthcare professional reporting or harm levels. Findings suggested that an information-sharing and feedback mechanism should be implemented to eliminate preventable medical adverse events. Healthcare institutions managers and government officials should take targeted actions to decrease these preventable medical adverse events through quality improvement efforts.


Assuntos
Registros Eletrônicos de Saúde/estatística & dados numéricos , Erros Médicos/estatística & dados numéricos , Erros de Medicação/estatística & dados numéricos , Gestão da Segurança/normas , Algoritmos , Bases de Dados Factuais/normas , Registros Eletrônicos de Saúde/classificação , Humanos , Erros Médicos/classificação , Erros Médicos/prevenção & controle , Erros de Medicação/classificação , Erros de Medicação/prevenção & controle , Modelos Estatísticos , Gestão da Segurança/classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA