RESUMEN
OBJECTIVE: To improve neonatal patient safety through automated detection of medication administration errors (MAEs) in high alert medications including narcotics, vasoactive medication, intravenous fluids, parenteral nutrition, and insulin using the electronic health record (EHR); to evaluate rates of MAEs in neonatal care; and to compare the performance of computerized algorithms to traditional incident reporting for error detection. METHODS: We developed novel computerized algorithms to identify MAEs within the EHR of all neonatal patients treated in a level four neonatal intensive care unit (NICU) in 2011 and 2012. We evaluated the rates and types of MAEs identified by the automated algorithms and compared their performance to incident reporting. Performance was evaluated by physician chart review. RESULTS: In the combined 2011 and 2012 NICU data sets, the automated algorithms identified MAEs at the following rates: fentanyl, 0.4% (4 errors/1005 fentanyl administration records); morphine, 0.3% (11/4009); dobutamine, 0 (0/10); and milrinone, 0.3% (5/1925). We found higher MAE rates for other vasoactive medications including: dopamine, 11.6% (5/43); epinephrine, 10.0% (289/2890); and vasopressin, 12.8% (54/421). Fluid administration error rates were similar: intravenous fluids, 3.2% (273/8567); parenteral nutrition, 3.2% (649/20124); and lipid administration, 1.3% (203/15227). We also found 13 insulin administration errors with a resulting rate of 2.9% (13/456). MAE rates were higher for medications that were adjusted frequently and fluids administered concurrently. The algorithms identified many previously unidentified errors, demonstrating significantly better sensitivity (82% vs. 5%) and precision (70% vs. 50%) than incident reporting for error recognition. CONCLUSIONS: Automated detection of medication administration errors through the EHR is feasible and performs better than currently used incident reporting systems. Automated algorithms may be useful for real-time error identification and mitigation.
Asunto(s)
Analgésicos Opioides/uso terapéutico , Unidades de Cuidado Intensivo Neonatal , Errores de Medicación , Seguridad del Paciente , Gestión de Riesgos , Automatización , Humanos , Recién Nacido , Cuidado Intensivo Neonatal , Sistemas de Entrada de Órdenes MédicasRESUMEN
BACKGROUND: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. OBJECTIVE: Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. METHODS: To build the gold standard for evaluating the crowdsourcing workers' performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd's work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations. RESULTS: The agreement between the crowd's annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task. CONCLUSIONS: This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower's quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches.
Asunto(s)
Colaboración de las Masas/normas , Procesamiento de Lenguaje Natural , Medios de Comunicación Sociales , Telemedicina/normas , Ensayos Clínicos como Asunto/estadística & datos numéricos , Colaboración de las Masas/estadística & datos numéricos , Humanos , Internet , Proyectos Piloto , Control de Calidad , Telemedicina/estadística & datos numéricosRESUMEN
BACKGROUND: Cincinnati Children's Hospital Medical Center (CCHMC) has built the initial Natural Language Processing (NLP) component to extract medications with their corresponding medical conditions (Indications, Contraindications, Overdosage, and Adverse Reactions) as triples of medication-related information ([(1) drug name]-[(2) medical condition]-[(3) LOINC section header]) for an intelligent database system, in order to improve patient safety and the quality of health care. The Food and Drug Administration's (FDA) drug labels are used to demonstrate the feasibility of building the triples as an intelligent database system task. METHODS: This paper discusses a hybrid NLP system, called AutoMCExtractor, to collect medical conditions (including disease/disorder and sign/symptom) from drug labels published by the FDA. Altogether, 6,611 medical conditions in a manually-annotated gold standard were used for the system evaluation. The pre-processing step extracted the plain text from XML file and detected eight related LOINC sections (e.g. Adverse Reactions, Warnings and Precautions) for medical condition extraction. Conditional Random Fields (CRF) classifiers, trained on token, linguistic, and semantic features, were then used for medical condition extraction. Lastly, dictionary-based post-processing corrected boundary-detection errors of the CRF step. We evaluated the AutoMCExtractor on manually-annotated FDA drug labels and report the results on both token and span levels. RESULTS: Precision, recall, and F-measure were 0.90, 0.81, and 0.85, respectively, for the span level exact match; for the token-level evaluation, precision, recall, and F-measure were 0.92, 0.73, and 0.82, respectively. CONCLUSIONS: The results demonstrate that (1) medical conditions can be extracted from FDA drug labels with high performance; and (2) it is feasible to develop a framework for an intelligent database system.
Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Minería de Datos/métodos , Etiquetado de Medicamentos , United States Food and Drug Administration , Humanos , Sistemas de Medicación , Procesamiento de Lenguaje Natural , Ohio , Estados UnidosRESUMEN
The objective of this study was to determine whether the Food and Drug Administration's Adverse Event Reporting System (FAERS) data set could serve as the basis of automated electronic health record (EHR) monitoring for the adverse drug reaction (ADR) subset of adverse drug events. We retrospectively collected EHR entries for 71 909 pediatric inpatient visits at Cincinnati Children's Hospital Medical Center. Natural language processing (NLP) techniques were used to identify positive diseases/disorders and signs/symptoms (DDSSs) from the patients' clinical narratives. We downloaded all FAERS reports submitted by medical providers and extracted the reported drug-DDSS pairs. For each patient, we aligned the drug-DDSS pairs extracted from their clinical notes with the corresponding drug-DDSS pairs from the FAERS data set to identify Drug-Reaction Pair Sentences (DRPSs). The DRPSs were processed by NLP techniques to identify ADR-related DRPSs. We used clinician annotated, real-world EHR data as reference standard to evaluate the proposed algorithm. During evaluation, the algorithm achieved promising performance and showed great potential in identifying ADRs accurately for pediatric patients.
RESUMEN
OBJECTIVES: (1) To develop an automated eligibility screening (ES) approach for clinical trials in an urban tertiary care pediatric emergency department (ED); (2) to assess the effectiveness of natural language processing (NLP), information extraction (IE), and machine learning (ML) techniques on real-world clinical data and trials. DATA AND METHODS: We collected eligibility criteria for 13 randomly selected, disease-specific clinical trials actively enrolling patients between January 1, 2010 and August 31, 2012. In parallel, we retrospectively selected data fields including demographics, laboratory data, and clinical notes from the electronic health record (EHR) to represent profiles of all 202795 patients visiting the ED during the same period. Leveraging NLP, IE, and ML technologies, the automated ES algorithms identified patients whose profiles matched the trial criteria to reduce the pool of candidates for staff screening. The performance was validated on both a physician-generated gold standard of trial-patient matches and a reference standard of historical trial-patient enrollment decisions, where workload, mean average precision (MAP), and recall were assessed. RESULTS: Compared with the case without automation, the workload with automated ES was reduced by 92% on the gold standard set, with a MAP of 62.9%. The automated ES achieved a 450% increase in trial screening efficiency. The findings on the gold standard set were confirmed by large-scale evaluation on the reference set of trial-patient matches. DISCUSSION AND CONCLUSION: By exploiting the text of trial criteria and the content of EHRs, we demonstrated that NLP-, IE-, and ML-based automated ES could successfully identify patients for clinical trials.
Asunto(s)
Inteligencia Artificial , Ensayos Clínicos como Asunto , Determinación de la Elegibilidad , Servicio de Urgencia en Hospital/organización & administración , Almacenamiento y Recuperación de la Información , Selección de Paciente , Eficiencia Organizacional , Humanos , Procesamiento de Lenguaje NaturalRESUMEN
BACKGROUND: Early warning scores (EWS) are designed to identify early clinical deterioration by combining physiologic and/or laboratory measures to generate a quantified score. Current EWS leverage only a small fraction of Electronic Health Record (EHR) content. The planned widespread implementation of EHRs brings the promise of abundant data resources for prediction purposes. The three specific aims of our research are: (1) to develop an EHR-based automated algorithm to predict the need for Pediatric Intensive Care Unit (PICU) transfer in the first 24h of admission; (2) to evaluate the performance of the new algorithm on a held-out test data set; and (3) to compare the effectiveness of the new algorithm's with those of two published Pediatric Early Warning Scores (PEWS). METHODS: The cases were comprised of 526 encounters with 24-h Pediatric Intensive Care Unit (PICU) transfer. In addition to the cases, we randomly selected 6772 control encounters from 62516 inpatient admissions that were never transferred to the PICU. We used 29 variables in a logistic regression and compared our algorithm against two published PEWS on a held-out test data set. RESULTS: The logistic regression algorithm achieved 0.849 (95% CI 0.753-0.945) sensitivity, 0.859 (95% CI 0.850-0.868) specificity and 0.912 (95% CI 0.905-0.919) area under the curve (AUC) in the test set. Our algorithm's AUC was significantly higher, by 11.8 and 22.6% in the test set, than two published PEWS. CONCLUSION: The novel algorithm achieved higher sensitivity, specificity, and AUC than the two PEWS reported in the literature.
Asunto(s)
Algoritmos , Inteligencia Artificial , Niño Hospitalizado , Necesidades y Demandas de Servicios de Salud , Unidades de Cuidado Intensivo Pediátrico/organización & administración , Transferencia de Pacientes , Niño , Femenino , Estudios de Seguimiento , Humanos , Lactante , Masculino , Curva ROC , Estudios Retrospectivos , Índice de Severidad de la EnfermedadRESUMEN
BACKGROUND: Although electronic health records (EHRs) have the potential to provide a foundation for quality and safety algorithms, few studies have measured their impact on automated adverse event (AE) and medical error (ME) detection within the neonatal intensive care unit (NICU) environment. OBJECTIVE: This paper presents two phenotyping AE and ME detection algorithms (ie, IV infiltrations, narcotic medication oversedation and dosing errors) and describes manual annotation of airway management and medication/fluid AEs from NICU EHRs. METHODS: From 753 NICU patient EHRs from 2011, we developed two automatic AE/ME detection algorithms, and manually annotated 11 classes of AEs in 3263 clinical notes. Performance of the automatic AE/ME detection algorithms was compared to trigger tool and voluntary incident reporting results. AEs in clinical notes were double annotated and consensus achieved under neonatologist supervision. Sensitivity, positive predictive value (PPV), and specificity are reported. RESULTS: Twelve severe IV infiltrates were detected. The algorithm identified one more infiltrate than the trigger tool and eight more than incident reporting. One narcotic oversedation was detected demonstrating 100% agreement with the trigger tool. Additionally, 17 narcotic medication MEs were detected, an increase of 16 cases over voluntary incident reporting. CONCLUSIONS: Automated AE/ME detection algorithms provide higher sensitivity and PPV than currently used trigger tools or voluntary incident-reporting systems, including identification of potential dosing and frequency errors that current methods are unequipped to detect.
Asunto(s)
Manejo de la Vía Aérea/efectos adversos , Algoritmos , Registros Electrónicos de Salud , Infusiones Intravenosas/efectos adversos , Errores Médicos/efectos adversos , Seguridad del Paciente , Humanos , Recién Nacido , Unidades de Cuidado Intensivo Neonatal , Errores Médicos/prevención & control , Errores de Medicación/efectos adversos , Valor Predictivo de las Pruebas , Gestión de Riesgos , Sensibilidad y EspecificidadRESUMEN
OBJECTIVE: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized. METHODS: To build the gold standard, 1400 clinical trial announcements from the clinicaltrials.gov website were randomly selected and double annotated for diagnoses, signs, symptoms, Unified Medical Language System (UMLS) Concept Unique Identifiers, and SNOMED CT codes. We used two dictionary-based methods to pre-annotate the text. We evaluated the annotation time and potential bias through F-measures and ANOVA tests and implemented Bonferroni correction. RESULTS: Time savings ranged from 13.85% to 21.5% per entity. Inter-annotator agreement (IAA) ranged from 93.4% to 95.5%. There was no statistically significant difference for IAA and annotator performance in pre-annotations. CONCLUSIONS: On every experiment pair, the annotator with the pre-annotated text needed less time to annotate than the annotator with non-labeled text. The time savings were statistically significant. Moreover, the pre-annotation did not reduce the IAA or annotator performance. Dictionary-based pre-annotation is a feasible and practical method to reduce the cost of annotation of clinical named entity recognition in the eligibility sections of clinical trial announcements without introducing bias in the annotation process.
Asunto(s)
Ensayos Clínicos como Asunto , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Análisis de Varianza , Humanos , Estudios de Tiempo y MovimientoRESUMEN
OBJECTIVE: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication-attribute linkage detection in two clinical corpora. DATA AND METHODS: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard. RESULTS: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora. DISCUSSION AND CONCLUSIONS: We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN.
Asunto(s)
Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Registros Médicos , Preparaciones Farmacéuticas , Máquina de Vectores de Soporte , Ensayos Clínicos como Asunto , HumanosRESUMEN
OBJECTIVE: To evaluate a proposed natural language processing (NLP) and machine-learning based automated method to risk stratify abdominal pain patients by analyzing the content of the electronic health record (EHR). METHODS: We analyzed the EHRs of a random sample of 2100 pediatric emergency department (ED) patients with abdominal pain, including all with a final diagnosis of appendicitis. We developed an automated system to extract relevant elements from ED physician notes and lab values and to automatically assign a risk category for acute appendicitis (high, equivocal, or low), based on the Pediatric Appendicitis Score. We evaluated the performance of the system against a manually created gold standard (chart reviews by ED physicians) for recall, specificity, and precision. RESULTS: The system achieved an average F-measure of 0.867 (0.869 recall and 0.863 precision) for risk classification, which was comparable to physician experts. Recall/precision were 0.897/0.952 in the low-risk category, 0.855/0.886 in the high-risk category, and 0.854/0.766 in the equivocal-risk category. The information that the system required as input to achieve high F-measure was available within the first 4 h of the ED visit. CONCLUSIONS: Automated appendicitis risk categorization based on EHR content, including information from clinical notes, shows comparable performance to physician chart reviewers as measured by their inter-annotator agreement and represents a promising new approach for computerized decision support to promote application of evidence-based medicine at the point of care.