Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 600(7890): 675-679, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34887591

RESUMEN

Increased blood lipid levels are heritable risk factors of cardiovascular disease with varied prevalence worldwide owing to different dietary patterns and medication use1. Despite advances in prevention and treatment, in particular through reducing low-density lipoprotein cholesterol levels2, heart disease remains the leading cause of death worldwide3. Genome-wideassociation studies (GWAS) of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease. However, most previous GWAS4-23 have been conducted in European ancestry populations and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups. These include differences in allele frequencies, effect sizes and linkage-disequilibrium patterns24. Here we conduct a multi-ancestry, genome-wide genetic discovery meta-analysis of lipid levels in approximately 1.65 million individuals, including 350,000 of non-European ancestries. We quantify the gain in studying non-European ancestries and provide evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes. We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction (evaluated in approximately 295,000 individuals from 7 ancestry groupings). Modest gains in the number of discovered loci and ancestry-specific variants were also achieved. As GWAS expand emphasis beyond the identification of genes and fundamental biology towards the use of genetic variants for preventive and precision medicine25, we anticipate that increased diversity of participants will lead to more accurate and equitable26 application of polygenic scores in clinical practice.


Asunto(s)
Enfermedades Cardiovasculares , Estudio de Asociación del Genoma Completo , Enfermedades Cardiovasculares/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Desequilibrio de Ligamiento , Herencia Multifactorial , Polimorfismo de Nucleótido Simple/genética , Grupos de Población
2.
Am J Hum Genet ; 109(8): 1366-1387, 2022 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-35931049

RESUMEN

A major challenge of genome-wide association studies (GWASs) is to translate phenotypic associations into biological insights. Here, we integrate a large GWAS on blood lipids involving 1.6 million individuals from five ancestries with a wide array of functional genomic datasets to discover regulatory mechanisms underlying lipid associations. We first prioritize lipid-associated genes with expression quantitative trait locus (eQTL) colocalizations and then add chromatin interaction data to narrow the search for functional genes. Polygenic enrichment analysis across 697 annotations from a host of tissues and cell types confirms the central role of the liver in lipid levels and highlights the selective enrichment of adipose-specific chromatin marks in high-density lipoprotein cholesterol and triglycerides. Overlapping transcription factor (TF) binding sites with lipid-associated loci identifies TFs relevant in lipid biology. In addition, we present an integrative framework to prioritize causal variants at GWAS loci, producing a comprehensive list of candidate causal genes and variants with multiple layers of functional evidence. We highlight two of the prioritized genes, CREBRF and RRBP1, which show convergent evidence across functional datasets supporting their roles in lipid biology.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Cromatina/genética , Genómica , Humanos , Lípidos/genética , Polimorfismo de Nucleótido Simple/genética
4.
Int J Obes (Lond) ; 45(1): 155-169, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-32952152

RESUMEN

BACKGROUND/OBJECTIVES: Melanocortin-4 receptor (MC4R) plays an essential role in food intake and energy homeostasis. More than 170 MC4R variants have been described over the past two decades, with conflicting reports regarding the prevalence and phenotypic effects of these variants in diverse cohorts. To determine the frequency of MC4R variants in large cohort of different ancestries, we evaluated the MC4R coding region for 20,537 eMERGE participants with sequencing data plus additional 77,454 independent individuals with genome-wide genotyping data at this locus. SUBJECTS/METHODS: The sequencing data were obtained from the eMERGE phase III study, in which multisample variant call format calls have been generated, curated, and annotated. In addition to penetrance estimation using body mass index (BMI) as a binary outcome, GWAS and PheWAS were performed using median BMI in linear regression analyses. All results were adjusted for principal components, age, sex, and sites of genotyping. RESULTS: Targeted sequencing data of MC4R revealed 125 coding variants in 1839 eMERGE participants including 30 unreported coding variants that were predicted to be functionally damaging. Highly penetrant unreported variants included (L325I, E308K, D298N, S270F, F261L, T248A, D111V, and Y80F) in which seven participants had obesity class III defined as BMI ≥ 40 kg/m2. In GWAS analysis, in addition to known risk haplotype upstream of MC4R (best variant rs6567160 (P = 5.36 × 10-25, Beta = 0.37), a novel rare haplotype was detected which was protective against obesity and encompassed the V103I variant with known gain-of-function properties (P = 6.23 × 10-08, Beta = -0.62). PheWAS analyses extended this protective effect of V103I to type 2 diabetes, diabetic nephropathy, and chronic renal failure independent of BMI. CONCLUSIONS: MC4R screening in a large eMERGE cohort confirmed many previous findings, extend the MC4R pleotropic effects, and discovered additional MC4R rare alleles that probably contribute to obesity.


Asunto(s)
Variación Genética/genética , Estudio de Asociación del Genoma Completo , Obesidad , Receptor de Melanocortina Tipo 4/genética , Adulto , Anciano , Índice de Masa Corporal , Estudios de Cohortes , Femenino , Humanos , Masculino , Persona de Mediana Edad , Obesidad/epidemiología , Obesidad/genética
5.
BMC Med ; 17(1): 135, 2019 07 17.
Artículo en Inglés | MEDLINE | ID: mdl-31311600

RESUMEN

BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver illness with a genetically heterogeneous background that can be accompanied by considerable morbidity and attendant health care costs. The pathogenesis and progression of NAFLD is complex with many unanswered questions. We conducted genome-wide association studies (GWASs) using both adult and pediatric participants from the Electronic Medical Records and Genomics (eMERGE) Network to identify novel genetic contributors to this condition. METHODS: First, a natural language processing (NLP) algorithm was developed, tested, and deployed at each site to identify 1106 NAFLD cases and 8571 controls and histological data from liver tissue in 235 available participants. These include 1242 pediatric participants (396 cases, 846 controls). The algorithm included billing codes, text queries, laboratory values, and medication records. Next, GWASs were performed on NAFLD cases and controls and case-only analyses using histologic scores and liver function tests adjusting for age, sex, site, ancestry, PC, and body mass index (BMI). RESULTS: Consistent with previous results, a robust association was detected for the PNPLA3 gene cluster in participants with European ancestry. At the PNPLA3-SAMM50 region, three SNPs, rs738409, rs738408, and rs3747207, showed strongest association (best SNP rs738409 p = 1.70 × 10- 20). This effect was consistent in both pediatric (p = 9.92 × 10- 6) and adult (p = 9.73 × 10- 15) cohorts. Additionally, this variant was also associated with disease severity and NAFLD Activity Score (NAS) (p = 3.94 × 10- 8, beta = 0.85). PheWAS analysis link this locus to a spectrum of liver diseases beyond NAFLD with a novel negative correlation with gout (p = 1.09 × 10- 4). We also identified novel loci for NAFLD disease severity, including one novel locus for NAS score near IL17RA (rs5748926, p = 3.80 × 10- 8), and another near ZFP90-CDH1 for fibrosis (rs698718, p = 2.74 × 10- 11). Post-GWAS and gene-based analyses identified more than 300 genes that were used for functional and pathway enrichment analyses. CONCLUSIONS: In summary, this study demonstrates clear confirmation of a previously described NAFLD risk locus and several novel associations. Further collaborative studies including an ethnically diverse population with well-characterized liver histologic features of NAFLD are needed to further validate the novel findings.


Asunto(s)
Enfermedad del Hígado Graso no Alcohólico/genética , Adulto , Anciano , Índice de Masa Corporal , Estudios de Casos y Controles , Redes Comunitarias/organización & administración , Redes Comunitarias/estadística & datos numéricos , Progresión de la Enfermedad , Registros Electrónicos de Salud/organización & administración , Registros Electrónicos de Salud/estadística & datos numéricos , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Genómica/organización & administración , Genómica/estadística & datos numéricos , Humanos , Lipasa/genética , Masculino , Proteínas de la Membrana/genética , Persona de Mediana Edad , Morbilidad , Enfermedad del Hígado Graso no Alcohólico/epidemiología , Fenotipo , Polimorfismo de Nucleótido Simple , Transducción de Señal/genética
6.
J Biomed Inform ; 99: 103293, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31542521

RESUMEN

BACKGROUND: Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms. METHODS: We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category. RESULTS: A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37 ±â€¯1.38. Specifically, the average knowledge (K) score is 0.64 ±â€¯0.66, interpretation (I) score is 0.33 ±â€¯0.55, and programming (P) score is 0.40 ±â€¯0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks. CONCLUSION: This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.


Asunto(s)
Registros Electrónicos de Salud/clasificación , Informática Médica/métodos , Algoritmos , Genómica , Humanos , Fenotipo , Estudios Retrospectivos
7.
J Biomed Inform ; 96: 103253, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31325501

RESUMEN

BACKGROUND: Implementing clinical phenotypes across a network is labor intensive and potentially error prone. Use of a common data model may facilitate the process. METHODS: Electronic Medical Records and Genomics (eMERGE) sites implemented the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model across their electronic health record (EHR)-linked DNA biobanks. Two previously implemented eMERGE phenotypes were converted to OMOP and implemented across the network. RESULTS: It was feasible to implement the common data model across sites, with laboratory data producing the greatest challenge due to local encoding. Sites were then able to execute the OMOP phenotype in less than one day, as opposed to weeks of effort to manually implement an eMERGE phenotype in their bespoke research EHR databases. Of the sites that could compare the current OMOP phenotype implementation with the original eMERGE phenotype implementation, specific agreement ranged from 100% to 43%, with disagreements due to the original phenotype, the OMOP phenotype, changes in data, and issues in the databases. Using the OMOP query as a standard comparison revealed differences in the original implementations despite starting from the same definitions, code lists, flowcharts, and pseudocode. CONCLUSION: Using a common data model can dramatically speed phenotype implementation at the cost of having to populate that data model, though this will produce a net benefit as the number of phenotype implementations increases. Inconsistencies among the implementations of the original queries point to a potential benefit of using a common data model so that actual phenotype code and logic can be shared, mitigating human error in reinterpretation of a narrative phenotype definition.


Asunto(s)
Trastorno por Déficit de Atención con Hiperactividad/diagnóstico , Bases de Datos Factuales , Diabetes Mellitus Tipo 2/diagnóstico , Registros Electrónicos de Salud , Recolección de Datos , Humanos , Informática Médica , National Human Genome Research Institute (U.S.) , Estudios Observacionales como Asunto , Evaluación de Resultado en la Atención de Salud , Fenotipo , Proyectos de Investigación , Programas Informáticos , Estados Unidos
8.
J Med Internet Res ; 21(5): e13047, 2019 05 22.
Artículo en Inglés | MEDLINE | ID: mdl-31120022

RESUMEN

BACKGROUND: The continued digitization and maturation of health care information technology has made access to real-time data easier and feasible for more health care organizations. With this increased availability, the promise of using data to algorithmically detect health care-related events in real-time has become more of a reality. However, as more researchers and clinicians utilize real-time data delivery capabilities, it has become apparent that simply gaining access to the data is not a panacea, and some unique data challenges have emerged to the forefront in the process. OBJECTIVE: The aim of this viewpoint was to highlight some of the challenges that are germane to real-time processing of health care system-generated data and the accurate interpretation of the results. METHODS: Distinct challenges related to the use and processing of real-time data for safety event detection were compiled and reported by several informatics and clinical experts at a quaternary pediatric academic institution. The challenges were collated from the experiences of the researchers implementing real-time event detection on more than half a dozen distinct projects. The challenges have been presented in a challenge category-specific challenge-example format. RESULTS: In total, 8 major types of challenge categories were reported, with 13 specific challenges and 9 specific examples detailed to provide a context for the challenges. The examples reported are anchored to a specific project using medication order, medication administration record, and smart infusion pump data to detect discrepancies and errors between the 3 datasets. CONCLUSIONS: The use of real-time data to drive safety event detection and clinical decision support is extremely powerful, but it presents its own set of challenges that include data quality and technical complexity. These challenges must be recognized and accommodated for if the full promise of accurate, real-time safety event clinical decision support is to be realized.


Asunto(s)
Análisis de Datos , Sistemas de Apoyo a Decisiones Clínicas/normas , Registros Electrónicos de Salud/normas , Humanos
9.
Am J Respir Crit Care Med ; 195(4): 456-463, 2017 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-27611488

RESUMEN

RATIONALE: Despite significant advances in knowledge of the genetic architecture of asthma, specific contributors to the variability in the burden between populations remain uncovered. OBJECTIVES: To identify additional genetic susceptibility factors of asthma in European American and African American populations. METHODS: A phenotyping algorithm mining electronic medical records was developed and validated to recruit cases with asthma and control subjects from the Electronic Medical Records and Genomics network. Genome-wide association analyses were performed in pediatric and adult asthma cases and control subjects with European American and African American ancestry followed by metaanalysis. Nominally significant results were reanalyzed conditioning on allergy status. MEASUREMENTS AND MAIN RESULTS: The validation of the algorithm yielded an average of 95.8% positive predictive values for both cases and control subjects. The algorithm accrued 21,644 subjects (65.83% European American and 34.17% African American). We identified four novel population-specific associations with asthma after metaanalyses: loci 6p21.31, 9p21.2, and 10q21.3 in the European American population, and the PTGES gene in African Americans. TEK at 9p21.2, which encodes TIE2, has been shown to be involved in remodeling the airway wall in asthma, and the association remained significant after conditioning by allergy. PTGES, which encodes the prostaglandin E synthase, has also been linked to asthma, where deficient prostaglandin E2 synthesis has been associated with airway remodeling. CONCLUSIONS: This study adds to understanding of the genetic architecture of asthma in European Americans and African Americans and reinforces the need to study populations of diverse ethnic backgrounds to identify shared and unique genetic predictors of asthma.


Asunto(s)
Asma/genética , Negro o Afroamericano/genética , Registros Electrónicos de Salud/estadística & datos numéricos , Predisposición Genética a la Enfermedad/genética , Prostaglandina-E Sintasas/genética , Población Blanca/genética , Adolescente , Adulto , Remodelación de las Vías Aéreas (Respiratorias)/genética , Remodelación de las Vías Aéreas (Respiratorias)/inmunología , Algoritmos , Asma/etnología , Niño , Preescolar , Minería de Datos/métodos , Femenino , Predisposición Genética a la Enfermedad/etnología , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Metaanálisis como Asunto , Fenotipo , Prevalencia , Estados Unidos
10.
J Biomed Inform ; 57: 124-33, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26190267

RESUMEN

OBJECTIVE: To improve neonatal patient safety through automated detection of medication administration errors (MAEs) in high alert medications including narcotics, vasoactive medication, intravenous fluids, parenteral nutrition, and insulin using the electronic health record (EHR); to evaluate rates of MAEs in neonatal care; and to compare the performance of computerized algorithms to traditional incident reporting for error detection. METHODS: We developed novel computerized algorithms to identify MAEs within the EHR of all neonatal patients treated in a level four neonatal intensive care unit (NICU) in 2011 and 2012. We evaluated the rates and types of MAEs identified by the automated algorithms and compared their performance to incident reporting. Performance was evaluated by physician chart review. RESULTS: In the combined 2011 and 2012 NICU data sets, the automated algorithms identified MAEs at the following rates: fentanyl, 0.4% (4 errors/1005 fentanyl administration records); morphine, 0.3% (11/4009); dobutamine, 0 (0/10); and milrinone, 0.3% (5/1925). We found higher MAE rates for other vasoactive medications including: dopamine, 11.6% (5/43); epinephrine, 10.0% (289/2890); and vasopressin, 12.8% (54/421). Fluid administration error rates were similar: intravenous fluids, 3.2% (273/8567); parenteral nutrition, 3.2% (649/20124); and lipid administration, 1.3% (203/15227). We also found 13 insulin administration errors with a resulting rate of 2.9% (13/456). MAE rates were higher for medications that were adjusted frequently and fluids administered concurrently. The algorithms identified many previously unidentified errors, demonstrating significantly better sensitivity (82% vs. 5%) and precision (70% vs. 50%) than incident reporting for error recognition. CONCLUSIONS: Automated detection of medication administration errors through the EHR is feasible and performs better than currently used incident reporting systems. Automated algorithms may be useful for real-time error identification and mitigation.


Asunto(s)
Analgésicos Opioides/uso terapéutico , Unidades de Cuidado Intensivo Neonatal , Errores de Medicación , Seguridad del Paciente , Gestión de Riesgos , Automatización , Humanos , Recién Nacido , Cuidado Intensivo Neonatal , Sistemas de Entrada de Órdenes Médicas
11.
BMC Med Inform Decis Mak ; 15: 28, 2015 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-25881112

RESUMEN

BACKGROUND: Manual eligibility screening (ES) for a clinical trial typically requires a labor-intensive review of patient records that utilizes many resources. Leveraging state-of-the-art natural language processing (NLP) and information extraction (IE) technologies, we sought to improve the efficiency of physician decision-making in clinical trial enrollment. In order to markedly reduce the pool of potential candidates for staff screening, we developed an automated ES algorithm to identify patients who meet core eligibility characteristics of an oncology clinical trial. METHODS: We collected narrative eligibility criteria from ClinicalTrials.gov for 55 clinical trials actively enrolling oncology patients in our institution between 12/01/2009 and 10/31/2011. In parallel, our ES algorithm extracted clinical and demographic information from the Electronic Health Record (EHR) data fields to represent profiles of all 215 oncology patients admitted to cancer treatment during the same period. The automated ES algorithm then matched the trial criteria with the patient profiles to identify potential trial-patient matches. Matching performance was validated on a reference set of 169 historical trial-patient enrollment decisions, and workload, precision, recall, negative predictive value (NPV) and specificity were calculated. RESULTS: Without automation, an oncologist would need to review 163 patients per trial on average to replicate the historical patient enrollment for each trial. This workload is reduced by 85% to 24 patients when using automated ES (precision/recall/NPV/specificity: 12.6%/100.0%/100.0%/89.9%). Without automation, an oncologist would need to review 42 trials per patient on average to replicate the patient-trial matches that occur in the retrospective data set. With automated ES this workload is reduced by 90% to four trials (precision/recall/NPV/specificity: 35.7%/100.0%/100.0%/95.5%). CONCLUSION: By leveraging NLP and IE technologies, automated ES could dramatically increase the trial screening efficiency of oncologists and enable participation of small practices, which are often left out from trial enrollment. The algorithm has the potential to significantly reduce the effort to execute clinical research at a point in time when new initiatives of the cancer care community intend to greatly expand both the access to trials and the number of available trials.


Asunto(s)
Ensayos Clínicos como Asunto/métodos , Determinación de la Elegibilidad/métodos , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Neoplasias/terapia , Selección de Paciente , Niño , Humanos
12.
BMC Med Inform Decis Mak ; 15: 37, 2015 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-25943550

RESUMEN

BACKGROUND: In this study we implemented and developed state-of-the-art machine learning (ML) and natural language processing (NLP) technologies and built a computerized algorithm for medication reconciliation. Our specific aims are: (1) to develop a computerized algorithm for medication discrepancy detection between patients' discharge prescriptions (structured data) and medications documented in free-text clinical notes (unstructured data); and (2) to assess the performance of the algorithm on real-world medication reconciliation data. METHODS: We collected clinical notes and discharge prescription lists for all 271 patients enrolled in the Complex Care Medical Home Program at Cincinnati Children's Hospital Medical Center between 1/1/2010 and 12/31/2013. A double-annotated, gold-standard set of medication reconciliation data was created for this collection. We then developed a hybrid algorithm consisting of three processes: (1) a ML algorithm to identify medication entities from clinical notes, (2) a rule-based method to link medication names with their attributes, and (3) a NLP-based, hybrid approach to match medications with structured prescriptions in order to detect medication discrepancies. The performance was validated on the gold-standard medication reconciliation data, where precision (P), recall (R), F-value (F) and workload were assessed. RESULTS: The hybrid algorithm achieved 95.0%/91.6%/93.3% of P/R/F on medication entity detection and 98.7%/99.4%/99.1% of P/R/F on attribute linkage. The medication matching achieved 92.4%/90.7%/91.5% (P/R/F) on identifying matched medications in the gold-standard and 88.6%/82.5%/85.5% (P/R/F) on discrepant medications. By combining all processes, the algorithm achieved 92.4%/90.7%/91.5% (P/R/F) and 71.5%/65.2%/68.2% (P/R/F) on identifying the matched and the discrepant medications, respectively. The error analysis on algorithm outputs identified challenges to be addressed in order to improve medication discrepancy detection. CONCLUSION: By leveraging ML and NLP technologies, an end-to-end, computerized algorithm achieves promising outcome in reconciling medications between clinical notes and discharge prescriptions.


Asunto(s)
Algoritmos , Prescripciones de Medicamentos/normas , Aprendizaje Automático , Conciliación de Medicamentos/normas , Procesamiento de Lenguaje Natural , Alta del Paciente/normas , Adulto , Humanos
13.
J Biomed Inform ; 50: 173-183, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24556292

RESUMEN

OBJECTIVE: The current study aims to fill the gap in available healthcare de-identification resources by creating a new sharable dataset with realistic Protected Health Information (PHI) without reducing the value of the data for de-identification research. By releasing the annotated gold standard corpus with Data Use Agreement we would like to encourage other Computational Linguists to experiment with our data and develop new machine learning models for de-identification. This paper describes: (1) the modifications required by the Institutional Review Board before sharing the de-identification gold standard corpus; (2) our efforts to keep the PHI as realistic as possible; (3) and the tests to show the effectiveness of these efforts in preserving the value of the modified data set for machine learning model development. MATERIALS AND METHODS: In a previous study we built an original de-identification gold standard corpus annotated with true Protected Health Information (PHI) from 3503 randomly selected clinical notes for the 22 most frequent clinical note types of our institution. In the current study we modified the original gold standard corpus to make it suitable for external sharing by replacing HIPAA-specified PHI with newly generated realistic PHI. Finally, we evaluated the research value of this new dataset by comparing the performance of an existing published in-house de-identification system, when trained on the new de-identification gold standard corpus, with the performance of the same system, when trained on the original corpus. We assessed the potential benefits of using the new de-identification gold standard corpus to identify PHI in the i2b2 and PhysioNet datasets that were released by other groups for de-identification research. We also measured the effectiveness of the i2b2 and PhysioNet de-identification gold standard corpora in identifying PHI in our original clinical notes. RESULTS: Performance of the de-identification system using the new gold standard corpus as a training set was very close to training on the original corpus (92.56 vs. 93.48 overall F-measures). Best i2b2/PhysioNet/CCHMC cross-training performances were obtained when training on the new shared CCHMC gold standard corpus, although performances were still lower than corpus-specific trainings. DISCUSSION AND CONCLUSION: We successfully modified a de-identification dataset for external sharing while preserving the de-identification research value of the modified gold standard corpus with limited drop in machine learning de-identification performance.


Asunto(s)
Informática Médica , Seguridad Computacional , Registros Electrónicos de Salud , Health Insurance Portability and Accountability Act , Estados Unidos
14.
J Med Internet Res ; 15(4): e73, 2013 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-23548263

RESUMEN

BACKGROUND: A high-quality gold standard is vital for supervised, machine learning-based, clinical natural language processing (NLP) systems. In clinical NLP projects, expert annotators traditionally create the gold standard. However, traditional annotation is expensive and time-consuming. To reduce the cost of annotation, general NLP projects have turned to crowdsourcing based on Web 2.0 technology, which involves submitting smaller subtasks to a coordinated marketplace of workers on the Internet. Many studies have been conducted in the area of crowdsourcing, but only a few have focused on tasks in the general NLP field and only a handful in the biomedical domain, usually based upon very small pilot sample sizes. In addition, the quality of the crowdsourced biomedical NLP corpora were never exceptional when compared to traditionally-developed gold standards. The previously reported results on medical named entity annotation task showed a 0.68 F-measure based agreement between crowdsourced and traditionally-developed corpora. OBJECTIVE: Building upon previous work from the general crowdsourcing research, this study investigated the usability of crowdsourcing in the clinical NLP domain with special emphasis on achieving high agreement between crowdsourced and traditionally-developed corpora. METHODS: To build the gold standard for evaluating the crowdsourcing workers' performance, 1042 clinical trial announcements (CTAs) from the ClinicalTrials.gov website were randomly selected and double annotated for medication names, medication types, and linked attributes. For the experiments, we used CrowdFlower, an Amazon Mechanical Turk-based crowdsourcing platform. We calculated sensitivity, precision, and F-measure to evaluate the quality of the crowd's work and tested the statistical significance (P<.001, chi-square test) to detect differences between the crowdsourced and traditionally-developed annotations. RESULTS: The agreement between the crowd's annotations and the traditionally-generated corpora was high for: (1) annotations (0.87, F-measure for medication names; 0.73, medication types), (2) correction of previous annotations (0.90, medication names; 0.76, medication types), and excellent for (3) linking medications with their attributes (0.96). Simple voting provided the best judgment aggregation approach. There was no statistically significant difference between the crowd and traditionally-generated corpora. Our results showed a 27.9% improvement over previously reported results on medication named entity annotation task. CONCLUSIONS: This study offers three contributions. First, we proved that crowdsourcing is a feasible, inexpensive, fast, and practical approach to collect high-quality annotations for clinical text (when protected health information was excluded). We believe that well-designed user interfaces and rigorous quality control strategy for entity annotation and linking were critical to the success of this work. Second, as a further contribution to the Internet-based crowdsourcing field, we will publicly release the JavaScript and CrowdFlower Markup Language infrastructure code that is necessary to utilize CrowdFlower's quality control and crowdsourcing interfaces for named entity annotations. Finally, to spur future research, we will release the CTA annotations that were generated by traditional and crowdsourced approaches.


Asunto(s)
Colaboración de las Masas/normas , Procesamiento de Lenguaje Natural , Medios de Comunicación Sociales , Telemedicina/normas , Ensayos Clínicos como Asunto/estadística & datos numéricos , Colaboración de las Masas/estadística & datos numéricos , Humanos , Internet , Proyectos Piloto , Control de Calidad , Telemedicina/estadística & datos numéricos
15.
BMC Med Inform Decis Mak ; 13: 53, 2013 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-23617267

RESUMEN

BACKGROUND: Cincinnati Children's Hospital Medical Center (CCHMC) has built the initial Natural Language Processing (NLP) component to extract medications with their corresponding medical conditions (Indications, Contraindications, Overdosage, and Adverse Reactions) as triples of medication-related information ([(1) drug name]-[(2) medical condition]-[(3) LOINC section header]) for an intelligent database system, in order to improve patient safety and the quality of health care. The Food and Drug Administration's (FDA) drug labels are used to demonstrate the feasibility of building the triples as an intelligent database system task. METHODS: This paper discusses a hybrid NLP system, called AutoMCExtractor, to collect medical conditions (including disease/disorder and sign/symptom) from drug labels published by the FDA. Altogether, 6,611 medical conditions in a manually-annotated gold standard were used for the system evaluation. The pre-processing step extracted the plain text from XML file and detected eight related LOINC sections (e.g. Adverse Reactions, Warnings and Precautions) for medical condition extraction. Conditional Random Fields (CRF) classifiers, trained on token, linguistic, and semantic features, were then used for medical condition extraction. Lastly, dictionary-based post-processing corrected boundary-detection errors of the CRF step. We evaluated the AutoMCExtractor on manually-annotated FDA drug labels and report the results on both token and span levels. RESULTS: Precision, recall, and F-measure were 0.90, 0.81, and 0.85, respectively, for the span level exact match; for the token-level evaluation, precision, recall, and F-measure were 0.92, 0.73, and 0.82, respectively. CONCLUSIONS: The results demonstrate that (1) medical conditions can be extracted from FDA drug labels with high performance; and (2) it is feasible to develop a framework for an intelligent database system.


Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Minería de Datos/métodos , Etiquetado de Medicamentos , United States Food and Drug Administration , Humanos , Sistemas de Medicación , Procesamiento de Lenguaje Natural , Ohio , Estados Unidos
16.
Sci Rep ; 13(1): 1971, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36737471

RESUMEN

The electronic Medical Records and Genomics (eMERGE) Network assessed the feasibility of deploying portable phenotype rule-based algorithms with natural language processing (NLP) components added to improve performance of existing algorithms using electronic health records (EHRs). Based on scientific merit and predicted difficulty, eMERGE selected six existing phenotypes to enhance with NLP. We assessed performance, portability, and ease of use. We summarized lessons learned by: (1) challenges; (2) best practices to address challenges based on existing evidence and/or eMERGE experience; and (3) opportunities for future research. Adding NLP resulted in improved, or the same, precision and/or recall for all but one algorithm. Portability, phenotyping workflow/process, and technology were major themes. With NLP, development and validation took longer. Besides portability of NLP technology and algorithm replicability, factors to ensure success include privacy protection, technical infrastructure setup, intellectual property agreement, and efficient communication. Workflow improvements can improve communication and reduce implementation time. NLP performance varied mainly due to clinical document heterogeneity; therefore, we suggest using semi-structured notes, comprehensive documentation, and customization options. NLP portability is possible with improved phenotype algorithm performance, but careful planning and architecture of the algorithms is essential to support local customizations.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Genómica , Algoritmos , Fenotipo
17.
Hosp Pediatr ; 12(12): 1066-1072, 2022 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-36404764

RESUMEN

BACKGROUND AND OBJECTIVES: Diagnostic uncertainty is challenging to identify and study in clinical practice. This study compares differences in diagnosis code and health care utilization between a unique cohort of hospitalized children with uncertain diagnoses (UD) and matched controls. PATIENTS AND METHODS: This case-control study was conducted at Cincinnati Children's Hospital Medical Center. Cases were defined as patients admitted to the pediatric hospital medicine service and having UDs during their hospitalization. Control patients were matched on age strata, biological sex, and time of year. Outcomes included type of diagnosis codes used (ie, disease- or nondisease-based) and change in code from admission to discharge. Differences in diagnosis codes were evaluated using conditional logistic regression. Health care utilization outcomes included hospital length of stay (LOS), hospital transfer, consulting service utilization, rapid response team activations, escalation to intensive care, and 30-day health care reutilization. Differences in health care utilization were assessed using bivariate statistics. RESULTS: Our final cohort included 240 UD cases and 911 matched controls. Compared with matched controls, UD cases were 8 times more likely to receive a nondisease-based diagnosis code (odds ratio [OR], 8.0; 95% confidence interval [CI], 5.7-11.2) and 2.5 times more likely to have a change in their primary International Classification of Disease, 10th revision, diagnosis code between admission and discharge (OR, 2.5; 95% CI, 1.9-3.4). UD cases had a longer average LOS and higher transfer rates to our main hospital campus, consulting service use, and 30-day readmission rates. CONCLUSIONS: Hospitalized children with UDs have meaningfully different patterns of diagnosis code use and increased health care utilization compared with matched controls.


Asunto(s)
Hospitalización , Aceptación de la Atención de Salud , Niño , Humanos , Incertidumbre , Estudios de Casos y Controles , Hospitales Pediátricos
18.
Genome Biol ; 23(1): 268, 2022 12 27.
Artículo en Inglés | MEDLINE | ID: mdl-36575460

RESUMEN

BACKGROUND: Genetic variants within nearly 1000 loci are known to contribute to modulation of blood lipid levels. However, the biological pathways underlying these associations are frequently unknown, limiting understanding of these findings and hindering downstream translational efforts such as drug target discovery. RESULTS: To expand our understanding of the underlying biological pathways and mechanisms controlling blood lipid levels, we leverage a large multi-ancestry meta-analysis (N = 1,654,960) of blood lipids to prioritize putative causal genes for 2286 lipid associations using six gene prediction approaches. Using phenome-wide association (PheWAS) scans, we identify relationships of genetically predicted lipid levels to other diseases and conditions. We confirm known pleiotropic associations with cardiovascular phenotypes and determine novel associations, notably with cholelithiasis risk. We perform sex-stratified GWAS meta-analysis of lipid levels and show that 3-5% of autosomal lipid-associated loci demonstrate sex-biased effects. Finally, we report 21 novel lipid loci identified on the X chromosome. Many of the sex-biased autosomal and X chromosome lipid loci show pleiotropic associations with sex hormones, emphasizing the role of hormone regulation in lipid metabolism. CONCLUSIONS: Taken together, our findings provide insights into the biological mechanisms through which associated variants lead to altered lipid levels and potentially cardiovascular disease risk.


Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Caracteres Sexuales , Fenotipo , Lípidos/genética , Polimorfismo de Nucleótido Simple , Pleiotropía Genética
19.
JMIR Med Inform ; 8(9): e19774, 2020 Sep 02.
Artículo en Inglés | MEDLINE | ID: mdl-32876578

RESUMEN

BACKGROUND: At present, electronic health records (EHRs) are the central focus of clinical informatics given their role as the primary source of clinical data. Despite their granularity, the EHR data heavily rely on manual input and are prone to human errors. Many other sources of data exist in the clinical setting, including digital medical devices such as smart infusion pumps. When incorporated with prescribing data from EHRs, smart pump records (SPRs) are capable of shedding light on actions that take place during the medication use process. However, harmoniz-ing the 2 sources is hindered by multiple technical challenges, and the data quality and utility of SPRs have not been fully realized. OBJECTIVE: This study aims to evaluate the quality and utility of SPRs incorporated with EHR data in detecting medication administration errors. Our overarching hypothesis is that SPRs would contribute unique information in the med-ication use process, enabling more comprehensive detection of discrepancies and potential errors in medication administration. METHODS: We evaluated the medication use process of 9 high-risk medications for patients admitted to the neonatal inten-sive care unit during a 1-year period. An automated algorithm was developed to align SPRs with their medica-tion orders in the EHRs using patient ID, medication name, and timestamp. The aligned data were manually re-viewed by a clinical research coordinator and 2 pediatric physicians to identify discrepancies in medication ad-ministration. The data quality of SPRs was assessed with the proportion of information that was linked to valid EHR orders. To evaluate their utility, we compared the frequency and severity of discrepancies captured by the SPR and EHR data, respectively. A novel concordance assessment was also developed to understand the detec-tion power and capabilities of SPR and EHR data. RESULTS: Approximately 70% of the SPRs contained valid patient IDs and medication names, making them feasible for data integration. After combining the 2 sources, the investigative team reviewed 2307 medication orders with 10,575 medication administration records (MARs) and 23,397 SPRs. A total of 321 MAR and 682 SPR dis-crepancies were identified, with vasopressors showing the highest discrepancy rates, followed by narcotics and total parenteral nutrition. Compared with EHR MARs, substantial dosing discrepancies were more commonly detectable using the SPRs. The concordance analysis showed little overlap between MAR and SPR discrepan-cies, with most discrepancies captured by the SPR data. CONCLUSIONS: We integrated smart infusion pump information with EHR data to analyze the most error-prone phases of the medication lifecycle. The findings suggested that SPRs could be a more reliable data source for medication error detection. Ultimately, it is imperative to integrate SPR information with EHR data to fully detect and mitigate medication administration errors in the clinical setting.

20.
Int J Med Inform ; 111: 45-50, 2018 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-29425633

RESUMEN

BACKGROUND AND AIM: Many clinical research studies claim to collect data that are also captured in the electronic medical record (EMR). We evaluate the potential for EMR data to replace prospective research data collection. METHODS: Using a dataset of 358 surgical patients enrolled in a prospective study, we examined the completeness and agreement of EMR and study entries for several variables, including the patient's stay in the post-operative care unit (PACU), surgical pain relief and pain medication side effects. RESULTS: For all variables with a completeness percentage, values were greater than 96%. For the adverse event variables, we found slight to substantial agreement (Cohen's kappa), ranging from 0.19 (nausea) to 0.48 (respiratory depression) to 0.73 (emesis). CONCLUSION: The potential to use EMR data as a replacement for prospective research data collection shows promise, but for now, should be evaluated on a variable-by-variable basis.


Asunto(s)
Analgésicos Opioides/uso terapéutico , Recolección de Datos/métodos , Registros Electrónicos de Salud/estadística & datos numéricos , Dolor Postoperatorio/terapia , Humanos , Estudios Prospectivos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA