Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Pediatr Emerg Care ; 35(12): 868-873, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-30281551

RESUMEN

OBJECTIVE: Challenges with efficient patient recruitment including sociotechnical barriers for clinical trials are major barriers to the timely and efficacious conduct of translational studies. We conducted a time-and-motion study to investigate the workflow of clinical trial enrollment in a pediatric emergency department. METHODS: We observed clinical research coordinators during 3 clinically staffed shifts. One clinical research coordinator was shadowed at a time. Tasks were marked in 30-second intervals and annotated to include patient screening, patient contact, performing procedures, and physician contact. Statistical analysis was conducted on the patient enrollment activities. RESULTS: We conducted fifteen 120-minute observations from December 12, 2013, to January 3, 2014 and shadowed 8 clinical research coordinators. Patient screening took 31.62% of their time, patient contact took 18.67%, performing procedures took 17.6%, physician contact was 1%, and other activities took 31.0%. CONCLUSIONS: Screening patients for eligibility constituted the most time. Automated screening methods could help reduce this time. The findings suggest improvement areas in recruitment planning to increase the efficiency of clinical trial enrollment.


Asunto(s)
Determinación de la Elegibilidad/métodos , Servicio de Urgencia en Hospital/organización & administración , Tamizaje Masivo/métodos , Niño , Ensayos Clínicos como Asunto , Servicio de Urgencia en Hospital/normas , Humanos , Selección de Paciente , Estudios Prospectivos , Proyectos de Investigación , Estudios de Tiempo y Movimiento , Flujo de Trabajo
2.
Biomed Inform Insights ; 9: 1178222617713018, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28634427

RESUMEN

The objective of this study was to determine whether the Food and Drug Administration's Adverse Event Reporting System (FAERS) data set could serve as the basis of automated electronic health record (EHR) monitoring for the adverse drug reaction (ADR) subset of adverse drug events. We retrospectively collected EHR entries for 71 909 pediatric inpatient visits at Cincinnati Children's Hospital Medical Center. Natural language processing (NLP) techniques were used to identify positive diseases/disorders and signs/symptoms (DDSSs) from the patients' clinical narratives. We downloaded all FAERS reports submitted by medical providers and extracted the reported drug-DDSS pairs. For each patient, we aligned the drug-DDSS pairs extracted from their clinical notes with the corresponding drug-DDSS pairs from the FAERS data set to identify Drug-Reaction Pair Sentences (DRPSs). The DRPSs were processed by NLP techniques to identify ADR-related DRPSs. We used clinician annotated, real-world EHR data as reference standard to evaluate the proposed algorithm. During evaluation, the algorithm achieved promising performance and showed great potential in identifying ADRs accurately for pediatric patients.

3.
Int J Pediatr ; 2016: 4068582, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27698673

RESUMEN

Background and Objectives. The prevalence of severe obesity in children has doubled in the past decade. The objective of this study is to identify the clinical documentation of obesity in young children with a BMI ≥ 99th percentile at two large tertiary care pediatric hospitals. Methods. We used a standardized algorithm utilizing data from electronic health records to identify children with severe early onset obesity (BMI ≥ 99th percentile at age <6 years). We extracted descriptive terms and ICD-9 codes to evaluate documentation of obesity at Boston Children's Hospital and Cincinnati Children's Hospital and Medical Center between 2007 and 2014. Results. A total of 9887 visit records of 2588 children with severe early onset obesity were identified. Based on predefined criteria for documentation of obesity, 21.5% of children (13.5% of visits) had positive documentation, which varied by institution. Documentation in children first seen under 2 years of age was lower than in older children (15% versus 26%). Documentation was significantly higher in girls (29% versus 17%, p < 0.001), African American children (27% versus 19% in whites, p < 0.001), and the obesity focused specialty clinics (70% versus 15% in primary care and 9% in other subspecialty clinics, p < 0.001). Conclusions. There is significant opportunity for improvement in documentation of obesity in young children, even years after the 2007 AAP guidelines for management of obesity.

4.
Appl Clin Inform ; 7(3): 693-706, 2016 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-27452794

RESUMEN

OBJECTIVE: The objective of this study is to develop an algorithm to accurately identify children with severe early onset childhood obesity (ages 1-5.99 years) using structured and unstructured data from the electronic health record (EHR). INTRODUCTION: Childhood obesity increases risk factors for cardiovascular morbidity and vascular disease. Accurate definition of a high precision phenotype through a standardize tool is critical to the success of large-scale genomic studies and validating rare monogenic variants causing severe early onset obesity. DATA AND METHODS: Rule based and machine learning based algorithms were developed using structured and unstructured data from two EHR databases from Boston Children's Hospital (BCH) and Cincinnati Children's Hospital and Medical Center (CCHMC). Exclusion criteria including medications or comorbid diagnoses were defined. Machine learning algorithms were developed using cross-site training and testing in addition to experimenting with natural language processing features. RESULTS: Precision was emphasized for a high fidelity cohort. The rule-based algorithm performed the best overall, 0.895 (CCHMC) and 0.770 (BCH). The best feature set for machine learning employed Unified Medical Language System (UMLS) concept unique identifiers (CUIs), ICD-9 codes, and RxNorm codes. CONCLUSIONS: Detecting severe early childhood obesity is essential for the intervention potential in children at the highest long-term risk of developing comorbidities related to obesity and excluding patients with underlying pathological and non-syndromic causes of obesity assists in developing a high-precision cohort for genetic study. Further such phenotyping efforts inform future practical application in health care environments utilizing clinical decision support.


Asunto(s)
Aprendizaje Automático , Obesidad Infantil/diagnóstico , Atención Terciaria de Salud , Niño , Preescolar , Comorbilidad , Diagnóstico Precoz , Femenino , Humanos , Lactante , Masculino , Obesidad Infantil/epidemiología
5.
PLoS One ; 11(7): e0159621, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27472449

RESUMEN

OBJECTIVE: Cohort selection is challenging for large-scale electronic health record (EHR) analyses, as International Classification of Diseases 9th edition (ICD-9) diagnostic codes are notoriously unreliable disease predictors. Our objective was to develop, evaluate, and validate an automated algorithm for determining an Autism Spectrum Disorder (ASD) patient cohort from EHR. We demonstrate its utility via the largest investigation to date of the co-occurrence patterns of medical comorbidities in ASD. METHODS: We extracted ICD-9 codes and concepts derived from the clinical notes. A gold standard patient set was labeled by clinicians at Boston Children's Hospital (BCH) (N = 150) and Cincinnati Children's Hospital and Medical Center (CCHMC) (N = 152). Two algorithms were created: (1) rule-based implementing the ASD criteria from Diagnostic and Statistical Manual of Mental Diseases 4th edition, (2) predictive classifier. The positive predictive values (PPV) achieved by these algorithms were compared to an ICD-9 code baseline. We clustered the patients based on grouped ICD-9 codes and evaluated subgroups. RESULTS: The rule-based algorithm produced the best PPV: (a) BCH: 0.885 vs. 0.273 (baseline); (b) CCHMC: 0.840 vs. 0.645 (baseline); (c) combined: 0.864 vs. 0.460 (baseline). A validation at Children's Hospital of Philadelphia yielded 0.848 (PPV). Clustering analyses of comorbidities on the three-site large cohort (N = 20,658 ASD patients) identified psychiatric, developmental, and seizure disorder clusters. CONCLUSIONS: In a large cross-institutional cohort, co-occurrence patterns of comorbidities in ASDs provide further hypothetical evidence for distinct courses in ASD. The proposed automated algorithms for cohort selection open avenues for other large-scale EHR studies and individualized treatment of ASD.


Asunto(s)
Algoritmos , Trastorno del Espectro Autista/diagnóstico , Registros Electrónicos de Salud , Niño , Preescolar , Estudios de Cohortes , Femenino , Humanos , Masculino
6.
J Am Med Inform Assoc ; 23(4): 671-80, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27121609

RESUMEN

OBJECTIVE: (1) To develop an automated algorithm to predict a patient's response (ie, if the patient agrees or declines) before he/she is approached for a clinical trial invitation; (2) to assess the algorithm performance and the predictors on real-world patient recruitment data for a diverse set of clinical trials in a pediatric emergency department; and (3) to identify directions for future studies in predicting patients' participation response. MATERIALS AND METHODS: We collected 3345 patients' response to trial invitations on 18 clinical trials at one center that were actively enrolling patients between January 1, 2010 and December 31, 2012. In parallel, we retrospectively extracted demographic, socioeconomic, and clinical predictors from multiple sources to represent the patients' profiles. Leveraging machine learning methodology, the automated algorithms predicted participation response for individual patients and identified influential features associated with their decision-making. The performance was validated on the collection of actual patient response, where precision, recall, F-measure, and area under the ROC curve were assessed. RESULTS: Compared to the random response predictor that simulated the current practice, the machine learning algorithms achieved significantly better performance (Precision/Recall/F-measure/area under the ROC curve: 70.82%/92.02%/80.04%/72.78% on 10-fold cross validation and 71.52%/92.68%/80.74%/75.74% on the test set). By analyzing the significant features output by the algorithms, the study confirmed several literature findings and identified challenges that could be mitigated to optimize recruitment. CONCLUSION: By exploiting predictive variables from multiple sources, we demonstrated that machine learning algorithms have great potential in improving the effectiveness of the recruitment process by automatically predicting patients' participation response to trial invitations.


Asunto(s)
Ensayos Clínicos como Asunto , Servicio de Urgencia en Hospital , Aprendizaje Automático , Participación del Paciente , Algoritmos , Niño , Femenino , Hospitales Pediátricos , Humanos , Masculino , Aceptación de la Atención de Salud
7.
J Biomed Inform ; 57: 124-33, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26190267

RESUMEN

OBJECTIVE: To improve neonatal patient safety through automated detection of medication administration errors (MAEs) in high alert medications including narcotics, vasoactive medication, intravenous fluids, parenteral nutrition, and insulin using the electronic health record (EHR); to evaluate rates of MAEs in neonatal care; and to compare the performance of computerized algorithms to traditional incident reporting for error detection. METHODS: We developed novel computerized algorithms to identify MAEs within the EHR of all neonatal patients treated in a level four neonatal intensive care unit (NICU) in 2011 and 2012. We evaluated the rates and types of MAEs identified by the automated algorithms and compared their performance to incident reporting. Performance was evaluated by physician chart review. RESULTS: In the combined 2011 and 2012 NICU data sets, the automated algorithms identified MAEs at the following rates: fentanyl, 0.4% (4 errors/1005 fentanyl administration records); morphine, 0.3% (11/4009); dobutamine, 0 (0/10); and milrinone, 0.3% (5/1925). We found higher MAE rates for other vasoactive medications including: dopamine, 11.6% (5/43); epinephrine, 10.0% (289/2890); and vasopressin, 12.8% (54/421). Fluid administration error rates were similar: intravenous fluids, 3.2% (273/8567); parenteral nutrition, 3.2% (649/20124); and lipid administration, 1.3% (203/15227). We also found 13 insulin administration errors with a resulting rate of 2.9% (13/456). MAE rates were higher for medications that were adjusted frequently and fluids administered concurrently. The algorithms identified many previously unidentified errors, demonstrating significantly better sensitivity (82% vs. 5%) and precision (70% vs. 50%) than incident reporting for error recognition. CONCLUSIONS: Automated detection of medication administration errors through the EHR is feasible and performs better than currently used incident reporting systems. Automated algorithms may be useful for real-time error identification and mitigation.


Asunto(s)
Analgésicos Opioides/uso terapéutico , Unidades de Cuidado Intensivo Neonatal , Errores de Medicación , Seguridad del Paciente , Gestión de Riesgos , Automatización , Humanos , Recién Nacido , Cuidado Intensivo Neonatal , Sistemas de Entrada de Órdenes Médicas
8.
BMC Med Inform Decis Mak ; 15: 37, 2015 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-25943550

RESUMEN

BACKGROUND: In this study we implemented and developed state-of-the-art machine learning (ML) and natural language processing (NLP) technologies and built a computerized algorithm for medication reconciliation. Our specific aims are: (1) to develop a computerized algorithm for medication discrepancy detection between patients' discharge prescriptions (structured data) and medications documented in free-text clinical notes (unstructured data); and (2) to assess the performance of the algorithm on real-world medication reconciliation data. METHODS: We collected clinical notes and discharge prescription lists for all 271 patients enrolled in the Complex Care Medical Home Program at Cincinnati Children's Hospital Medical Center between 1/1/2010 and 12/31/2013. A double-annotated, gold-standard set of medication reconciliation data was created for this collection. We then developed a hybrid algorithm consisting of three processes: (1) a ML algorithm to identify medication entities from clinical notes, (2) a rule-based method to link medication names with their attributes, and (3) a NLP-based, hybrid approach to match medications with structured prescriptions in order to detect medication discrepancies. The performance was validated on the gold-standard medication reconciliation data, where precision (P), recall (R), F-value (F) and workload were assessed. RESULTS: The hybrid algorithm achieved 95.0%/91.6%/93.3% of P/R/F on medication entity detection and 98.7%/99.4%/99.1% of P/R/F on attribute linkage. The medication matching achieved 92.4%/90.7%/91.5% (P/R/F) on identifying matched medications in the gold-standard and 88.6%/82.5%/85.5% (P/R/F) on discrepant medications. By combining all processes, the algorithm achieved 92.4%/90.7%/91.5% (P/R/F) and 71.5%/65.2%/68.2% (P/R/F) on identifying the matched and the discrepant medications, respectively. The error analysis on algorithm outputs identified challenges to be addressed in order to improve medication discrepancy detection. CONCLUSION: By leveraging ML and NLP technologies, an end-to-end, computerized algorithm achieves promising outcome in reconciling medications between clinical notes and discharge prescriptions.


Asunto(s)
Algoritmos , Prescripciones de Medicamentos/normas , Aprendizaje Automático , Conciliación de Medicamentos/normas , Procesamiento de Lenguaje Natural , Alta del Paciente/normas , Adulto , Humanos
9.
BMC Med Inform Decis Mak ; 15: 28, 2015 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-25881112

RESUMEN

BACKGROUND: Manual eligibility screening (ES) for a clinical trial typically requires a labor-intensive review of patient records that utilizes many resources. Leveraging state-of-the-art natural language processing (NLP) and information extraction (IE) technologies, we sought to improve the efficiency of physician decision-making in clinical trial enrollment. In order to markedly reduce the pool of potential candidates for staff screening, we developed an automated ES algorithm to identify patients who meet core eligibility characteristics of an oncology clinical trial. METHODS: We collected narrative eligibility criteria from ClinicalTrials.gov for 55 clinical trials actively enrolling oncology patients in our institution between 12/01/2009 and 10/31/2011. In parallel, our ES algorithm extracted clinical and demographic information from the Electronic Health Record (EHR) data fields to represent profiles of all 215 oncology patients admitted to cancer treatment during the same period. The automated ES algorithm then matched the trial criteria with the patient profiles to identify potential trial-patient matches. Matching performance was validated on a reference set of 169 historical trial-patient enrollment decisions, and workload, precision, recall, negative predictive value (NPV) and specificity were calculated. RESULTS: Without automation, an oncologist would need to review 163 patients per trial on average to replicate the historical patient enrollment for each trial. This workload is reduced by 85% to 24 patients when using automated ES (precision/recall/NPV/specificity: 12.6%/100.0%/100.0%/89.9%). Without automation, an oncologist would need to review 42 trials per patient on average to replicate the patient-trial matches that occur in the retrospective data set. With automated ES this workload is reduced by 90% to four trials (precision/recall/NPV/specificity: 35.7%/100.0%/100.0%/95.5%). CONCLUSION: By leveraging NLP and IE technologies, automated ES could dramatically increase the trial screening efficiency of oncologists and enable participation of small practices, which are often left out from trial enrollment. The algorithm has the potential to significantly reduce the effort to execute clinical research at a point in time when new initiatives of the cancer care community intend to greatly expand both the access to trials and the number of available trials.


Asunto(s)
Ensayos Clínicos como Asunto/métodos , Determinación de la Elegibilidad/métodos , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Neoplasias/terapia , Selección de Paciente , Niño , Humanos
10.
J Am Med Inform Assoc ; 22(1): 166-78, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25030032

RESUMEN

OBJECTIVES: (1) To develop an automated eligibility screening (ES) approach for clinical trials in an urban tertiary care pediatric emergency department (ED); (2) to assess the effectiveness of natural language processing (NLP), information extraction (IE), and machine learning (ML) techniques on real-world clinical data and trials. DATA AND METHODS: We collected eligibility criteria for 13 randomly selected, disease-specific clinical trials actively enrolling patients between January 1, 2010 and August 31, 2012. In parallel, we retrospectively selected data fields including demographics, laboratory data, and clinical notes from the electronic health record (EHR) to represent profiles of all 202795 patients visiting the ED during the same period. Leveraging NLP, IE, and ML technologies, the automated ES algorithms identified patients whose profiles matched the trial criteria to reduce the pool of candidates for staff screening. The performance was validated on both a physician-generated gold standard of trial-patient matches and a reference standard of historical trial-patient enrollment decisions, where workload, mean average precision (MAP), and recall were assessed. RESULTS: Compared with the case without automation, the workload with automated ES was reduced by 92% on the gold standard set, with a MAP of 62.9%. The automated ES achieved a 450% increase in trial screening efficiency. The findings on the gold standard set were confirmed by large-scale evaluation on the reference set of trial-patient matches. DISCUSSION AND CONCLUSION: By exploiting the text of trial criteria and the content of EHRs, we demonstrated that NLP-, IE-, and ML-based automated ES could successfully identify patients for clinical trials.


Asunto(s)
Inteligencia Artificial , Ensayos Clínicos como Asunto , Determinación de la Elegibilidad , Servicio de Urgencia en Hospital/organización & administración , Almacenamiento y Recuperación de la Información , Selección de Paciente , Eficiencia Organizacional , Humanos , Procesamiento de Lenguaje Natural
11.
Resuscitation ; 85(8): 1065-71, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24813568

RESUMEN

BACKGROUND: Early warning scores (EWS) are designed to identify early clinical deterioration by combining physiologic and/or laboratory measures to generate a quantified score. Current EWS leverage only a small fraction of Electronic Health Record (EHR) content. The planned widespread implementation of EHRs brings the promise of abundant data resources for prediction purposes. The three specific aims of our research are: (1) to develop an EHR-based automated algorithm to predict the need for Pediatric Intensive Care Unit (PICU) transfer in the first 24h of admission; (2) to evaluate the performance of the new algorithm on a held-out test data set; and (3) to compare the effectiveness of the new algorithm's with those of two published Pediatric Early Warning Scores (PEWS). METHODS: The cases were comprised of 526 encounters with 24-h Pediatric Intensive Care Unit (PICU) transfer. In addition to the cases, we randomly selected 6772 control encounters from 62516 inpatient admissions that were never transferred to the PICU. We used 29 variables in a logistic regression and compared our algorithm against two published PEWS on a held-out test data set. RESULTS: The logistic regression algorithm achieved 0.849 (95% CI 0.753-0.945) sensitivity, 0.859 (95% CI 0.850-0.868) specificity and 0.912 (95% CI 0.905-0.919) area under the curve (AUC) in the test set. Our algorithm's AUC was significantly higher, by 11.8 and 22.6% in the test set, than two published PEWS. CONCLUSION: The novel algorithm achieved higher sensitivity, specificity, and AUC than the two PEWS reported in the literature.


Asunto(s)
Algoritmos , Inteligencia Artificial , Niño Hospitalizado , Necesidades y Demandas de Servicios de Salud , Unidades de Cuidado Intensivo Pediátrico/organización & administración , Transferencia de Pacientes , Niño , Femenino , Estudios de Seguimiento , Humanos , Lactante , Masculino , Curva ROC , Estudios Retrospectivos , Índice de Severidad de la Enfermedad
12.
J Biomed Inform ; 50: 173-183, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24556292

RESUMEN

OBJECTIVE: The current study aims to fill the gap in available healthcare de-identification resources by creating a new sharable dataset with realistic Protected Health Information (PHI) without reducing the value of the data for de-identification research. By releasing the annotated gold standard corpus with Data Use Agreement we would like to encourage other Computational Linguists to experiment with our data and develop new machine learning models for de-identification. This paper describes: (1) the modifications required by the Institutional Review Board before sharing the de-identification gold standard corpus; (2) our efforts to keep the PHI as realistic as possible; (3) and the tests to show the effectiveness of these efforts in preserving the value of the modified data set for machine learning model development. MATERIALS AND METHODS: In a previous study we built an original de-identification gold standard corpus annotated with true Protected Health Information (PHI) from 3503 randomly selected clinical notes for the 22 most frequent clinical note types of our institution. In the current study we modified the original gold standard corpus to make it suitable for external sharing by replacing HIPAA-specified PHI with newly generated realistic PHI. Finally, we evaluated the research value of this new dataset by comparing the performance of an existing published in-house de-identification system, when trained on the new de-identification gold standard corpus, with the performance of the same system, when trained on the original corpus. We assessed the potential benefits of using the new de-identification gold standard corpus to identify PHI in the i2b2 and PhysioNet datasets that were released by other groups for de-identification research. We also measured the effectiveness of the i2b2 and PhysioNet de-identification gold standard corpora in identifying PHI in our original clinical notes. RESULTS: Performance of the de-identification system using the new gold standard corpus as a training set was very close to training on the original corpus (92.56 vs. 93.48 overall F-measures). Best i2b2/PhysioNet/CCHMC cross-training performances were obtained when training on the new shared CCHMC gold standard corpus, although performances were still lower than corpus-specific trainings. DISCUSSION AND CONCLUSION: We successfully modified a de-identification dataset for external sharing while preserving the de-identification research value of the modified gold standard corpus with limited drop in machine learning de-identification performance.


Asunto(s)
Informática Médica , Seguridad Computacional , Registros Electrónicos de Salud , Health Insurance Portability and Accountability Act , Estados Unidos
13.
J Am Med Inform Assoc ; 21(5): 776-84, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24401171

RESUMEN

BACKGROUND: Although electronic health records (EHRs) have the potential to provide a foundation for quality and safety algorithms, few studies have measured their impact on automated adverse event (AE) and medical error (ME) detection within the neonatal intensive care unit (NICU) environment. OBJECTIVE: This paper presents two phenotyping AE and ME detection algorithms (ie, IV infiltrations, narcotic medication oversedation and dosing errors) and describes manual annotation of airway management and medication/fluid AEs from NICU EHRs. METHODS: From 753 NICU patient EHRs from 2011, we developed two automatic AE/ME detection algorithms, and manually annotated 11 classes of AEs in 3263 clinical notes. Performance of the automatic AE/ME detection algorithms was compared to trigger tool and voluntary incident reporting results. AEs in clinical notes were double annotated and consensus achieved under neonatologist supervision. Sensitivity, positive predictive value (PPV), and specificity are reported. RESULTS: Twelve severe IV infiltrates were detected. The algorithm identified one more infiltrate than the trigger tool and eight more than incident reporting. One narcotic oversedation was detected demonstrating 100% agreement with the trigger tool. Additionally, 17 narcotic medication MEs were detected, an increase of 16 cases over voluntary incident reporting. CONCLUSIONS: Automated AE/ME detection algorithms provide higher sensitivity and PPV than currently used trigger tools or voluntary incident-reporting systems, including identification of potential dosing and frequency errors that current methods are unequipped to detect.


Asunto(s)
Manejo de la Vía Aérea/efectos adversos , Algoritmos , Registros Electrónicos de Salud , Infusiones Intravenosas/efectos adversos , Errores Médicos/efectos adversos , Seguridad del Paciente , Humanos , Recién Nacido , Unidades de Cuidado Intensivo Neonatal , Errores Médicos/prevención & control , Errores de Medicación/efectos adversos , Valor Predictivo de las Pruebas , Gestión de Riesgos , Sensibilidad y Especificidad
14.
J Am Med Inform Assoc ; 21(3): 406-13, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24001514

RESUMEN

OBJECTIVE: To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized. METHODS: To build the gold standard, 1400 clinical trial announcements from the clinicaltrials.gov website were randomly selected and double annotated for diagnoses, signs, symptoms, Unified Medical Language System (UMLS) Concept Unique Identifiers, and SNOMED CT codes. We used two dictionary-based methods to pre-annotate the text. We evaluated the annotation time and potential bias through F-measures and ANOVA tests and implemented Bonferroni correction. RESULTS: Time savings ranged from 13.85% to 21.5% per entity. Inter-annotator agreement (IAA) ranged from 93.4% to 95.5%. There was no statistically significant difference for IAA and annotator performance in pre-annotations. CONCLUSIONS: On every experiment pair, the annotator with the pre-annotated text needed less time to annotate than the annotator with non-labeled text. The time savings were statistically significant. Moreover, the pre-annotation did not reduce the IAA or annotator performance. Dictionary-based pre-annotation is a feasible and practical method to reduce the cost of annotation of clinical named entity recognition in the eligibility sections of clinical trial announcements without introducing bias in the annotation process.


Asunto(s)
Ensayos Clínicos como Asunto , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Análisis de Varianza , Humanos , Estudios de Tiempo y Movimiento
15.
Front Genet ; 4: 268, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24348519

RESUMEN

UNLABELLED: Common variations at the loci harboring the fat mass and obesity gene (FTO), MC4R, and TMEM18 are consistently reported as being associated with obesity and body mass index (BMI) especially in adult population. In order to confirm this effect in pediatric population five European ancestry cohorts from pediatric eMERGE-II network (CCHMC-BCH) were evaluated. METHOD: Data on 5049 samples of European ancestry were obtained from the Electronic Medical Records (EMRs) of two large academic centers in five different genotyped cohorts. For all available samples, gender, age, height, and weight were collected and BMI was calculated. To account for age and sex differences in BMI, BMI z-scores were generated using 2000 Centers of Disease Control and Prevention (CDC) growth charts. A Genome-wide association study (GWAS) was performed with BMI z-score. After removing missing data and outliers based on principal components (PC) analyses, 2860 samples were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and BMI was tested using linear regression adjusting for age, gender, and PC by cohort. The effects of SNPs were modeled assuming additive, recessive, and dominant effects of the minor allele. Meta-analysis was conducted using a weighted z-score approach. RESULTS: The mean age of subjects was 9.8 years (range 2-19). The proportion of male subjects was 56%. In these cohorts, 14% of samples had a BMI ≥95 and 28 ≥ 85%. Meta analyses produced a signal at 16q12 genomic region with the best result of p = 1.43 × 10(-) (7) [p (rec) = 7.34 × 10(-) (8)) for the SNP rs8050136 at the first intron of FTO gene (z = 5.26) and with no heterogeneity between cohorts (p = 0.77). Under a recessive model, another published SNP at this locus, rs1421085, generates the best result [z = 5.782, p (rec) = 8.21 × 10(-) (9)]. Imputation in this region using dense 1000-Genome and Hapmap CEU samples revealed 71 SNPs with p < 10(-) (6), all at the first intron of FTO locus. When hetero-geneity was permitted between cohorts, signals were also obtained in other previously identified loci, including MC4R (rs12964056, p = 6.87 × 10(-) (7), z = -4.98), cholecystokinin CCK (rs8192472, p = 1.33 × 10(-) (6), z = -4.85), Interleukin 15 (rs2099884, p = 1.27 × 10(-) (5), z = 4.34), low density lipoprotein receptor-related protein 1B [LRP1B (rs7583748, p = 0.00013, z = -3.81)] and near transmembrane protein 18 (TMEM18) (rs7561317, p = 0.001, z = -3.17). We also detected a novel locus at chromosome 3 at COL6A5 [best SNP = rs1542829, minor allele frequency (MAF) of 5% p = 4.35 × 10(-) (9), z = 5.89]. CONCLUSION: An EMR linked cohort study demonstrates that the BMI-Z measurements can be successfully extracted and linked to genomic data with meaningful confirmatory results. We verified the high prevalence of childhood rate of overweight and obesity in our cohort (28%). In addition, our data indicate that genetic variants in the first intron of FTO, a known adult genetic risk factor for BMI, are also robustly associated with BMI in pediatric population.

16.
J Am Med Inform Assoc ; 20(e2): e212-20, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24130231

RESUMEN

OBJECTIVE: To evaluate a proposed natural language processing (NLP) and machine-learning based automated method to risk stratify abdominal pain patients by analyzing the content of the electronic health record (EHR). METHODS: We analyzed the EHRs of a random sample of 2100 pediatric emergency department (ED) patients with abdominal pain, including all with a final diagnosis of appendicitis. We developed an automated system to extract relevant elements from ED physician notes and lab values and to automatically assign a risk category for acute appendicitis (high, equivocal, or low), based on the Pediatric Appendicitis Score. We evaluated the performance of the system against a manually created gold standard (chart reviews by ED physicians) for recall, specificity, and precision. RESULTS: The system achieved an average F-measure of 0.867 (0.869 recall and 0.863 precision) for risk classification, which was comparable to physician experts. Recall/precision were 0.897/0.952 in the low-risk category, 0.855/0.886 in the high-risk category, and 0.854/0.766 in the equivocal-risk category. The information that the system required as input to achieve high F-measure was available within the first 4 h of the ED visit. CONCLUSIONS: Automated appendicitis risk categorization based on EHR content, including information from clinical notes, shows comparable performance to physician chart reviewers as measured by their inter-annotator agreement and represents a promising new approach for computerized decision support to promote application of evidence-based medicine at the point of care.


Asunto(s)
Dolor Abdominal/etiología , Algoritmos , Apendicitis/diagnóstico , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Inteligencia Artificial , Niño , Servicio de Urgencia en Hospital , Humanos , Medición de Riesgo/métodos
17.
J Med Internet Res ; 15(4): e73, 2013 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-23548263

RESUMEN

BACKGROUND: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. OBJECTIVE: Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. METHODS: To build the gold standard for evaluating the crowdsourcing workers' performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd's work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations. RESULTS: The agreement between the crowd's annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task. CONCLUSIONS: This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower's quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches.


Asunto(s)
Colaboración de las Masas/normas , Procesamiento de Lenguaje Natural , Medios de Comunicación Sociales , Telemedicina/normas , Ensayos Clínicos como Asunto/estadística & datos numéricos , Colaboración de las Masas/estadística & datos numéricos , Humanos , Internet , Proyectos Piloto , Control de Calidad , Telemedicina/estadística & datos numéricos
18.
BMC Med Inform Decis Mak ; 13: 53, 2013 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-23617267

RESUMEN

BACKGROUND: Cincinnati Children's Hospital Medical Center (CCHMC) has built the initial Natural Language Processing (NLP) component to extract medications with their corresponding medical conditions (Indications, Contraindications, Overdosage, and Adverse Reactions) as triples of medication-related information ([(1) drug name]-[(2) medical condition]-[(3) LOINC section header]) for an intelligent database system, in order to improve patient safety and the quality of health care. The Food and Drug Administration's (FDA) drug labels are used to demonstrate the feasibility of building the triples as an intelligent database system task. METHODS: This paper discusses a hybrid NLP system, called AutoMCExtractor, to collect medical conditions (including disease/disorder and sign/symptom) from drug labels published by the FDA. Altogether, 6,611 medical conditions in a manually-annotated gold standard were used for the system evaluation. The pre-processing step extracted the plain text from XML file and detected eight related LOINC sections (e.g. Adverse Reactions, Warnings and Precautions) for medical condition extraction. Conditional Random Fields (CRF) classifiers, trained on token, linguistic, and semantic features, were then used for medical condition extraction. Lastly, dictionary-based post-processing corrected boundary-detection errors of the CRF step. We evaluated the AutoMCExtractor on manually-annotated FDA drug labels and report the results on both token and span levels. RESULTS: Precision, recall, and F-measure were 0.90, 0.81, and 0.85, respectively, for the span level exact match; for the token-level evaluation, precision, recall, and F-measure were 0.92, 0.73, and 0.82, respectively. CONCLUSIONS: The results demonstrate that (1) medical conditions can be extracted from FDA drug labels with high performance; and (2) it is feasible to develop a framework for an intelligent database system.


Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Minería de Datos/métodos , Etiquetado de Medicamentos , United States Food and Drug Administration , Humanos , Sistemas de Medicación , Procesamiento de Lenguaje Natural , Ohio , Estados Unidos
19.
J Am Med Inform Assoc ; 20(5): 915-21, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23268488

RESUMEN

OBJECTIVE: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication-attribute linkage detection in two clinical corpora. DATA AND METHODS: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard. RESULTS: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora. DISCUSSION AND CONCLUSIONS: We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN.


Asunto(s)
Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Registros Médicos , Preparaciones Farmacéuticas , Máquina de Vectores de Soporte , Ensayos Clínicos como Asunto , Humanos
20.
J Am Med Inform Assoc ; 20(1): 84-94, 2013 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-22859645

RESUMEN

OBJECTIVE: (1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents. MATERIAL AND METHODS: A cross-sectional study that included 3503 stratified, randomly selected clinical notes (over 22 note types) from five million documents produced at one of the largest US pediatric hospitals. Sensitivity, precision, F value of two automated de-identification systems for removing all 18 HIPAA-defined protected health information elements were computed. Performance was assessed against a manually generated 'gold standard'. Statistical significance was tested. The automated de-identification performance was also compared with that of two humans on a 10% subsample of the gold standard. The effect of de-identification on the performance of subsequent medication extraction was measured. RESULTS: The gold standard included 30 815 protected health information elements and more than one million tokens. The most accurate NLP method had 91.92% sensitivity (R) and 95.08% precision (P) overall. The performance of the system was indistinguishable from that of human annotators (annotators' performance was 92.15%(R)/93.95%(P) and 94.55%(R)/88.45%(P) overall while the best system obtained 92.91%(R)/95.73%(P) on same text). The impact of automated de-identification was minimal on the utility of the narrative notes for subsequent information extraction as measured by the sensitivity and precision of medication name extraction. DISCUSSION AND CONCLUSION: NLP-based de-identification shows excellent performance that rivals the performance of human annotators. Furthermore, unlike manual de-identification, the automated approach scales up to millions of documents quickly and inexpensively.


Asunto(s)
Confidencialidad , Minería de Datos , Registros Electrónicos de Salud , Difusión de la Información , Procesamiento de Lenguaje Natural , Algoritmos , Estudios Transversales , Hospitales Pediátricos , Humanos , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados , Evaluación de la Tecnología Biomédica , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...