Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 88
Filtrar
1.
BMC Med Inform Decis Mak ; 22(Suppl 2): 348, 2024 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-38433189

RESUMO

BACKGROUND: Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW). METHODS: We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC). RESULTS: Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm. CONCLUSION: Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.


Assuntos
Lúpus Eritematoso Sistêmico , Nefrite Lúpica , Humanos , Nefrite Lúpica/diagnóstico , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Fenótipo , Doenças Raras
2.
Circulation ; 145(12): 877-891, 2022 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-34930020

RESUMO

BACKGROUND: Sequencing Mendelian arrhythmia genes in individuals without an indication for arrhythmia genetic testing can identify carriers of pathogenic or likely pathogenic (P/LP) variants. However, the extent to which these variants are associated with clinically meaningful phenotypes before or after return of variant results is unclear. In addition, the majority of discovered variants are currently classified as variants of uncertain significance, limiting clinical actionability. METHODS: The eMERGE-III study (Electronic Medical Records and Genomics Phase III) is a multicenter prospective cohort that included 21 846 participants without previous indication for cardiac genetic testing. Participants were sequenced for 109 Mendelian disease genes, including 10 linked to arrhythmia syndromes. Variant carriers were assessed with electronic health record-derived phenotypes and follow-up clinical examination. Selected variants of uncertain significance (n=50) were characterized in vitro with automated electrophysiology experiments in HEK293 cells. RESULTS: As previously reported, 3.0% of participants had P/LP variants in the 109 genes. Herein, we report 120 participants (0.6%) with P/LP arrhythmia variants. Compared with noncarriers, arrhythmia P/LP carriers had a significantly higher burden of arrhythmia phenotypes in their electronic health records. Fifty-four participants had variant results returned. Nineteen of these 54 participants had inherited arrhythmia syndrome diagnoses (primarily long-QT syndrome), and 12 of these 19 diagnoses were made only after variant results were returned (0.05%). After in vitro functional evaluation of 50 variants of uncertain significance, we reclassified 11 variants: 3 to likely benign and 8 to P/LP. CONCLUSIONS: Genome sequencing in a large population without indication for arrhythmia genetic testing identified phenotype-positive carriers of variants in congenital arrhythmia syndrome disease genes. As the genomes of large numbers of people are sequenced, the disease risk from rare variants in arrhythmia genes can be assessed by integrating genomic screening, electronic health record phenotypes, and in vitro functional studies. REGISTRATION: URL: https://www. CLINICALTRIALS: gov; Unique identifier; NCT03394859.


Assuntos
Arritmias Cardíacas , Testes Genéticos , Arritmias Cardíacas/diagnóstico , Arritmias Cardíacas/genética , Predisposição Genética para Doença , Testes Genéticos/métodos , Genômica , Células HEK293 , Humanos , Fenótipo , Estudos Prospectivos
3.
Am J Hum Genet ; 106(5): 707-716, 2020 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-32386537

RESUMO

Because polygenic risk scores (PRSs) for coronary heart disease (CHD) are derived from mainly European ancestry (EA) cohorts, their validity in African ancestry (AA) and Hispanic ethnicity (HE) individuals is unclear. We investigated associations of "restricted" and genome-wide PRSs with CHD in three major racial and ethnic groups in the U.S. The eMERGE cohort (mean age 48 ± 14 years, 58% female) included 45,645 EA, 7,597 AA, and 2,493 HE individuals. We assessed two restricted PRSs (PRSTikkanen and PRSTada; 28 and 50 variants, respectively) and two genome-wide PRSs (PRSmetaGRS and PRSLDPred; 1.7 M and 6.6 M variants, respectively) derived from EA cohorts. Over a median follow-up of 11.1 years, 2,652 incident CHD events occurred. Hazard and odds ratios for the association of PRSs with CHD were similar in EA and HE cohorts but lower in AA cohorts. Genome-wide PRSs were more strongly associated with CHD than restricted PRSs were. PRSmetaGRS, the best performing PRS, was associated with CHD in all three cohorts; hazard ratios (95% CI) per 1 SD increase were 1.53 (1.46-1.60), 1.53 (1.23-1.90), and 1.27 (1.13-1.43) for incident CHD in EA, HE, and AA individuals, respectively. The hazard ratios were comparable in the EA and HE cohorts (pinteraction = 0.77) but were significantly attenuated in AA individuals (pinteraction= 2.9 × 10-3). These results highlight the potential clinical utility of PRSs for CHD as well as the need to assemble diverse cohorts to generate ancestry- and ethnicity PRSs.


Assuntos
Negro ou Afro-Americano/genética , Doença das Coronárias/genética , Predisposição Genética para Doença , Hispânico ou Latino/genética , Herança Multifatorial/genética , População Branca/genética , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Razão de Chances
4.
BMC Med Inform Decis Mak ; 22(1): 23, 2022 01 28.
Artigo em Inglês | MEDLINE | ID: mdl-35090449

RESUMO

INTRODUCTION: Currently, one of the commonly used methods for disseminating electronic health record (EHR)-based phenotype algorithms is providing a narrative description of the algorithm logic, often accompanied by flowcharts. A challenge with this mode of dissemination is the potential for under-specification in the algorithm definition, which leads to ambiguity and vagueness. METHODS: This study examines incidents of under-specification that occurred during the implementation of 34 narrative phenotyping algorithms in the electronic Medical Record and Genomics (eMERGE) network. We reviewed the online communication history between algorithm developers and implementers within the Phenotype Knowledge Base (PheKB) platform, where questions could be raised and answered regarding the intended implementation of a phenotype algorithm. RESULTS: We developed a taxonomy of under-specification categories via an iterative review process between two groups of annotators. Under-specifications that lead to ambiguity and vagueness were consistently found across narrative phenotype algorithms developed by all involved eMERGE sites. DISCUSSION AND CONCLUSION: Our findings highlight that under-specification is an impediment to the accuracy and efficiency of the implementation of current narrative phenotyping algorithms, and we propose approaches for mitigating these issues and improved methods for disseminating EHR phenotyping algorithms.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Genômica , Humanos , Bases de Conhecimento , Fenótipo
5.
Circulation ; 142(17): 1633-1646, 2020 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-32981348

RESUMO

BACKGROUND: Abdominal aortic aneurysm (AAA) is an important cause of cardiovascular mortality; however, its genetic determinants remain incompletely defined. In total, 10 previously identified risk loci explain a small fraction of AAA heritability. METHODS: We performed a genome-wide association study in the Million Veteran Program testing ≈18 million DNA sequence variants with AAA (7642 cases and 172 172 controls) in veterans of European ancestry with independent replication in up to 4972 cases and 99 858 controls. We then used mendelian randomization to examine the causal effects of blood pressure on AAA. We examined the association of AAA risk variants with aneurysms in the lower extremity, cerebral, and iliac arterial beds, and derived a genome-wide polygenic risk score (PRS) to identify a subset of the population at greater risk for disease. RESULTS: Through a genome-wide association study, we identified 14 novel loci, bringing the total number of known significant AAA loci to 24. In our mendelian randomization analysis, we demonstrate that a genetic increase of 10 mm Hg in diastolic blood pressure (odds ratio, 1.43 [95% CI, 1.24-1.66]; P=1.6×10-6), as opposed to systolic blood pressure (odds ratio, 1.06 [95% CI, 0.97-1.15]; P=0.2), likely has a causal relationship with AAA development. We observed that 19 of 24 AAA risk variants associate with aneurysms in at least 1 other vascular territory. A 29-variant PRS was strongly associated with AAA (odds ratioPRS, 1.26 [95% CI, 1.18-1.36]; PPRS=2.7×10-11 per SD increase in PRS), independent of family history and smoking risk factors (odds ratioPRS+family history+smoking, 1.24 [95% CI, 1.14-1.35]; PPRS=1.27×10-6). Using this PRS, we identified a subset of the population with AAA prevalence greater than that observed in screening trials informing current guidelines. CONCLUSIONS: We identify novel AAA genetic associations with therapeutic implications and identify a subset of the population at significantly increased genetic risk of AAA independent of family history. Our data suggest that extending current screening guidelines to include testing to identify those with high polygenic AAA risk, once the cost of genotyping becomes comparable with that of screening ultrasound, would significantly increase the yield of current screening at reasonable cost.


Assuntos
Aneurisma da Aorta Abdominal/genética , Humanos , Veteranos
6.
J Biomed Inform ; 102: 103361, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31911172

RESUMO

Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an improved understanding of the disease pathophysiology and development of more targeted clinical interventions. This study used a memory network-based deep learning approach to discover AKI sub-phenotypes using structured and unstructured electronic health record (EHR) data of patients before AKI diagnosis. We leveraged a real world critical care EHR corpus including 37,486 ICU stays. Our approach identified three distinct sub-phenotypes: sub-phenotype I is with an average age of 63.03±17.25 years, and is characterized by mild loss of kidney excretory function (Serum Creatinine (SCr) 1.55±0.34 mg/dL, estimated Glomerular Filtration Rate Test (eGFR) 107.65±54.98 mL/min/1.73 m2). These patients are more likely to develop stage I AKI. Sub-phenotype II is with average age 66.81±10.43 years, and was characterized by severe loss of kidney excretory function (SCr 1.96±0.49 mg/dL, eGFR 82.19±55.92 mL/min/1.73 m2). These patients are more likely to develop stage III AKI. Sub-phenotype III is with average age 65.07±11.32 years, and was characterized moderate loss of kidney excretory function and thus more likely to develop stage II AKI (SCr 1.69±0.32 mg/dL, eGFR 93.97±56.53 mL/min/1.73 m2). Both SCr and eGFR are significantly different across the three sub-phenotypes with statistical testing plus postdoc analysis, and the conclusion still holds after age adjustment.


Assuntos
Injúria Renal Aguda , Registros Eletrônicos de Saúde , Injúria Renal Aguda/diagnóstico , Idoso , Creatinina , Taxa de Filtração Glomerular , Humanos , Pessoa de Meia-Idade , Fenótipo
7.
Circulation ; 138(22): 2469-2481, 2018 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-30571344

RESUMO

BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-ß predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-ß. CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.


Assuntos
Biomarcadores/sangue , Doenças das Artérias Carótidas/diagnóstico , Estudo de Associação Genômica Ampla , Proteoma/análise , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças das Artérias Carótidas/genética , Feminino , Genótipo , Humanos , Lectinas Tipo C/análise , Masculino , Pessoa de Meia-Idade , Razão de Chances , Fenótipo , Polimorfismo de Nucleotídeo Único , Proteômica , Receptor beta de Fator de Crescimento Derivado de Plaquetas/sangue
8.
Circulation ; 138(17): 1839-1849, 2018 10 23.
Artigo em Inglês | MEDLINE | ID: mdl-29703846

RESUMO

BACKGROUND: Coronary heart disease (CHD) is a leading cause of death globally. Although therapy with statins decreases circulating levels of low-density lipoprotein cholesterol and the incidence of CHD, additional events occur despite statin therapy in some individuals. The genetic determinants of this residual cardiovascular risk remain unknown. METHODS: We performed a 2-stage genome-wide association study of CHD events during statin therapy. We first identified 3099 cases who experienced CHD events (defined as acute myocardial infarction or the need for coronary revascularization) during statin therapy and 7681 controls without CHD events during comparable intensity and duration of statin therapy from 4 sites in the Electronic Medical Records and Genomics Network. We then sought replication of candidate variants in another 160 cases and 1112 controls from a fifth Electronic Medical Records and Genomics site, which joined the network after the initial genome-wide association study. Finally, we performed a phenome-wide association study for other traits linked to the most significant locus. RESULTS: The meta-analysis identified 7 single nucleotide polymorphisms at a genome-wide level of significance within the LPA/PLG locus associated with CHD events on statin treatment. The most significant association was for an intronic single nucleotide polymorphism within LPA/PLG (rs10455872; minor allele frequency, 0.069; odds ratio, 1.58; 95% confidence interval, 1.35-1.86; P=2.6×10-10). In the replication cohort, rs10455872 was also associated with CHD events (odds ratio, 1.71; 95% confidence interval, 1.14-2.57; P=0.009). The association of this single nucleotide polymorphism with CHD events was independent of statin-induced change in low-density lipoprotein cholesterol (odds ratio, 1.62; 95% confidence interval, 1.17-2.24; P=0.004) and persisted in individuals with low-density lipoprotein cholesterol ≤70 mg/dL (odds ratio, 2.43; 95% confidence interval, 1.18-4.75; P=0.015). A phenome-wide association study supported the effect of this region on coronary heart disease and did not identify noncardiovascular phenotypes. CONCLUSIONS: Genetic variations at the LPA locus are associated with CHD events during statin therapy independently of the extent of low-density lipoprotein cholesterol lowering. This finding provides support for exploring strategies targeting circulating concentrations of lipoprotein(a) to reduce CHD events in patients receiving statins.


Assuntos
Doença das Coronárias/genética , Doença das Coronárias/prevenção & controle , Dislipidemias/tratamento farmacológico , Dislipidemias/genética , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Lipoproteína(a)/genética , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Doença das Coronárias/sangue , Doença das Coronárias/diagnóstico , Bases de Dados Genéticas , Dislipidemias/sangue , Dislipidemias/diagnóstico , Registros Eletrônicos de Saúde , Frequência do Gene , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/efeitos adversos , Fenótipo , Medição de Risco , Fatores de Risco , Fatores de Tempo , Resultado do Tratamento
9.
BMC Med ; 17(1): 135, 2019 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-31311600

RESUMO

BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver illness with a genetically heterogeneous background that can be accompanied by considerable morbidity and attendant health care costs. The pathogenesis and progression of NAFLD is complex with many unanswered questions. We conducted genome-wide association studies (GWASs) using both adult and pediatric participants from the Electronic Medical Records and Genomics (eMERGE) Network to identify novel genetic contributors to this condition. METHODS: First, a natural language processing (NLP) algorithm was developed, tested, and deployed at each site to identify 1106 NAFLD cases and 8571 controls and histological data from liver tissue in 235 available participants. These include 1242 pediatric participants (396 cases, 846 controls). The algorithm included billing codes, text queries, laboratory values, and medication records. Next, GWASs were performed on NAFLD cases and controls and case-only analyses using histologic scores and liver function tests adjusting for age, sex, site, ancestry, PC, and body mass index (BMI). RESULTS: Consistent with previous results, a robust association was detected for the PNPLA3 gene cluster in participants with European ancestry. At the PNPLA3-SAMM50 region, three SNPs, rs738409, rs738408, and rs3747207, showed strongest association (best SNP rs738409 p = 1.70 × 10- 20). This effect was consistent in both pediatric (p = 9.92 × 10- 6) and adult (p = 9.73 × 10- 15) cohorts. Additionally, this variant was also associated with disease severity and NAFLD Activity Score (NAS) (p = 3.94 × 10- 8, beta = 0.85). PheWAS analysis link this locus to a spectrum of liver diseases beyond NAFLD with a novel negative correlation with gout (p = 1.09 × 10- 4). We also identified novel loci for NAFLD disease severity, including one novel locus for NAS score near IL17RA (rs5748926, p = 3.80 × 10- 8), and another near ZFP90-CDH1 for fibrosis (rs698718, p = 2.74 × 10- 11). Post-GWAS and gene-based analyses identified more than 300 genes that were used for functional and pathway enrichment analyses. CONCLUSIONS: In summary, this study demonstrates clear confirmation of a previously described NAFLD risk locus and several novel associations. Further collaborative studies including an ethnically diverse population with well-characterized liver histologic features of NAFLD are needed to further validate the novel findings.


Assuntos
Hepatopatia Gordurosa não Alcoólica/genética , Adulto , Idoso , Índice de Massa Corporal , Estudos de Casos e Controles , Redes Comunitárias/organização & administração , Redes Comunitárias/estatística & dados numéricos , Progressão da Doença , Registros Eletrônicos de Saúde/organização & administração , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genômica/organização & administração , Genômica/estatística & dados numéricos , Humanos , Lipase/genética , Masculino , Proteínas de Membrana/genética , Pessoa de Meia-Idade , Morbidade , Hepatopatia Gordurosa não Alcoólica/epidemiologia , Fenótipo , Polimorfismo de Nucleotídeo Único , Transdução de Sinais/genética
10.
Circ Res ; 120(2): 341-353, 2017 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-27899403

RESUMO

RATIONALE: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA. OBJECTIVE: To identify additional AAA risk loci using data from all available genome-wide association studies. METHODS AND RESULTS: Through a meta-analysis of 6 genome-wide association study data sets and a validation study totaling 10 204 cases and 107 766 controls, we identified 4 new AAA risk loci: 1q32.3 (SMYD2), 13q12.11 (LINC00540), 20q13.12 (near PCIF1/MMP9/ZNF335), and 21q22.2 (ERG). In various database searches, we observed no new associations between the lead AAA single nucleotide polymorphisms and coronary artery disease, blood pressure, lipids, or diabetes mellitus. Network analyses identified ERG, IL6R, and LDLR as modifiers of MMP9, with a direct interaction between ERG and MMP9. CONCLUSIONS: The 4 new risk loci for AAA seem to be specific for AAA compared with other cardiovascular diseases and related traits suggesting that traditional cardiovascular risk factor management may only have limited value in preventing the progression of aneurysmal disease.


Assuntos
Aneurisma da Aorta Abdominal/diagnóstico , Aneurisma da Aorta Abdominal/genética , Loci Gênicos/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Aneurisma da Aorta Abdominal/epidemiologia , Predisposição Genética para Doença/epidemiologia , Variação Genética/genética , Estudo de Associação Genômica Ampla/tendências , Humanos
11.
J Biomed Inform ; 99: 103310, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31622801

RESUMO

BACKGROUND: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). METHODS: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources - Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. RESULTS: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals' judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. CONCLUSIONS: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.


Assuntos
Registros Eletrônicos de Saúde/classificação , Interoperabilidade da Informação em Saúde , Obesidade/epidemiologia , Alta do Paciente , Adulto , Algoritmos , Índice de Massa Corporal , Comorbidade , Feminino , Humanos , Aprendizado de Máquina , Masculino , Fenótipo , Software
12.
J Biomed Inform ; 99: 103293, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31542521

RESUMO

BACKGROUND: Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms. METHODS: We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category. RESULTS: A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37 ±â€¯1.38. Specifically, the average knowledge (K) score is 0.64 ±â€¯0.66, interpretation (I) score is 0.33 ±â€¯0.55, and programming (P) score is 0.40 ±â€¯0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks. CONCLUSION: This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.


Assuntos
Registros Eletrônicos de Saúde/classificação , Informática Médica/métodos , Algoritmos , Genômica , Humanos , Fenótipo , Estudos Retrospectivos
13.
J Biomed Inform ; 96: 103253, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31325501

RESUMO

BACKGROUND: Implementing clinical phenotypes across a network is labor intensive and potentially error prone. Use of a common data model may facilitate the process. METHODS: Electronic Medical Records and Genomics (eMERGE) sites implemented the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model across their electronic health record (EHR)-linked DNA biobanks. Two previously implemented eMERGE phenotypes were converted to OMOP and implemented across the network. RESULTS: It was feasible to implement the common data model across sites, with laboratory data producing the greatest challenge due to local encoding. Sites were then able to execute the OMOP phenotype in less than one day, as opposed to weeks of effort to manually implement an eMERGE phenotype in their bespoke research EHR databases. Of the sites that could compare the current OMOP phenotype implementation with the original eMERGE phenotype implementation, specific agreement ranged from 100% to 43%, with disagreements due to the original phenotype, the OMOP phenotype, changes in data, and issues in the databases. Using the OMOP query as a standard comparison revealed differences in the original implementations despite starting from the same definitions, code lists, flowcharts, and pseudocode. CONCLUSION: Using a common data model can dramatically speed phenotype implementation at the cost of having to populate that data model, though this will produce a net benefit as the number of phenotype implementations increases. Inconsistencies among the implementations of the original queries point to a potential benefit of using a common data model so that actual phenotype code and logic can be shared, mitigating human error in reinterpretation of a narrative phenotype definition.


Assuntos
Transtorno do Deficit de Atenção com Hiperatividade/diagnóstico , Bases de Dados Factuais , Diabetes Mellitus Tipo 2/diagnóstico , Registros Eletrônicos de Saúde , Coleta de Dados , Humanos , Informática Médica , National Human Genome Research Institute (U.S.) , Estudos Observacionais como Assunto , Avaliação de Resultados em Cuidados de Saúde , Fenótipo , Projetos de Pesquisa , Software , Estados Unidos
14.
Am J Hum Genet ; 97(4): 512-20, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26365338

RESUMO

Hereditary hemochromatosis (HH) is a common autosomal-recessive disorder associated with pathogenic HFE variants, most commonly those resulting in p.Cys282Tyr and p.His63Asp. Recommendations on returning incidental findings of HFE variants in individuals undergoing genome-scale sequencing should be informed by penetrance estimates of HH in unselected samples. We used the eMERGE Network, a multicenter cohort with genotype data linked to electronic medical records, to estimate the diagnostic rate and clinical penetrance of HH in 98 individuals homozygous for the variant coding for HFE p.Cys282Tyr and 397 compound heterozygotes with variants resulting in p.[His63Asp];[Cys282Tyr]. The diagnostic rate of HH in males was 24.4% for p.Cys282Tyr homozygotes and 3.5% for compound heterozygotes (p < 0.001); in females, it was 14.0% for p.Cys282Tyr homozygotes and 2.3% for compound heterozygotes (p < 0.001). Only males showed differences across genotypes in transferrin saturation levels (100% of homozygotes versus 37.5% of compound heterozygotes with transferrin saturation > 50%; p = 0.003), serum ferritin levels (77.8% versus 33.3% with serum ferritin > 300 ng/ml; p = 0.006), and diabetes (44.7% versus 28.0%; p = 0.03). No differences were found in the prevalence of heart disease, arthritis, or liver disease, except for the rate of liver biopsy (10.9% versus 1.8% [p = 0.013] in males; 9.1% versus 2% [p = 0.035] in females). Given the higher rate of HH diagnosis than in prior studies, the high penetrance of iron overload, and the frequency of at-risk genotypes, in addition to other suggested actionable adult-onset genetic conditions, opportunistic screening should be considered for p.[Cys282Tyr];[Cys282Tyr] individuals with existing genomic data.


Assuntos
Variação Genética/genética , Hemocromatose/epidemiologia , Hemocromatose/genética , Antígenos de Histocompatibilidade Classe I/genética , Proteínas de Membrana/genética , Adulto , Idoso , Substituição de Aminoácidos , Criança , Estudos de Coortes , Feminino , Seguimentos , Genótipo , Hemocromatose/diagnóstico , Proteína da Hemocromatose , Heterozigoto , Homozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Penetrância , Prognóstico , Estados Unidos/epidemiologia
15.
Am J Respir Crit Care Med ; 195(4): 456-463, 2017 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-27611488

RESUMO

RATIONALE: Despite significant advances in knowledge of the genetic architecture of asthma, specific contributors to the variability in the burden between populations remain uncovered. OBJECTIVES: To identify additional genetic susceptibility factors of asthma in European American and African American populations. METHODS: A phenotyping algorithm mining electronic medical records was developed and validated to recruit cases with asthma and control subjects from the Electronic Medical Records and Genomics network. Genome-wide association analyses were performed in pediatric and adult asthma cases and control subjects with European American and African American ancestry followed by metaanalysis. Nominally significant results were reanalyzed conditioning on allergy status. MEASUREMENTS AND MAIN RESULTS: The validation of the algorithm yielded an average of 95.8% positive predictive values for both cases and control subjects. The algorithm accrued 21,644 subjects (65.83% European American and 34.17% African American). We identified four novel population-specific associations with asthma after metaanalyses: loci 6p21.31, 9p21.2, and 10q21.3 in the European American population, and the PTGES gene in African Americans. TEK at 9p21.2, which encodes TIE2, has been shown to be involved in remodeling the airway wall in asthma, and the association remained significant after conditioning by allergy. PTGES, which encodes the prostaglandin E synthase, has also been linked to asthma, where deficient prostaglandin E2 synthesis has been associated with airway remodeling. CONCLUSIONS: This study adds to understanding of the genetic architecture of asthma in European Americans and African Americans and reinforces the need to study populations of diverse ethnic backgrounds to identify shared and unique genetic predictors of asthma.


Assuntos
Asma/genética , Negro ou Afro-Americano/genética , Registros Eletrônicos de Saúde/estatística & dados numéricos , Predisposição Genética para Doença/genética , Prostaglandina-E Sintases/genética , População Branca/genética , Adolescente , Adulto , Remodelação das Vias Aéreas/genética , Remodelação das Vias Aéreas/imunologia , Algoritmos , Asma/etnologia , Criança , Pré-Escolar , Mineração de Dados/métodos , Feminino , Predisposição Genética para Doença/etnologia , Estudo de Associação Genômica Ampla , Humanos , Masculino , Metanálise como Assunto , Fenótipo , Prevalência , Estados Unidos
16.
Gastrointest Endosc ; 84(2): 296-303.e1, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26828760

RESUMO

BACKGROUND AND AIMS: Endoscopic resection (ER) is a safe and effective treatment for nonmalignant complex colorectal polyps (complex polyps). Surgical resection (SR) remains prevalent despite limited outcomes data. We aimed to evaluate SR outcomes for complex polyps and compare SR outcomes to those of ER. METHODS: We performed a single-center, retrospective, cohort study of all patients undergoing SR (2003-2013) and ER (2011-2013) for complex polyps. We excluded patients with invasive carcinoma from the SR cohort. Primary outcomes were 12-month adverse event (AE) rate, length of stay (LOS), and costs. SR outcomes over a 3-year period (2011-2013) were compared with the overlapping ER cohort. RESULTS: Over the 11-year period, 359 patients (mean [± SD] age 64 ± 11 years) underwent SR (58% laparoscopic) for complex polyps. In total, 17% experienced an AE, and 3% required additional surgery; 12-month mortality was 1%. Including readmissions, median LOS was 5 days (IQR 4-7 days), and costs were $14,528. When an AE occurred, costs ($25,557 vs $14,029; P < .0001) and LOS (11 vs 5 days; P < .0001) significantly increased. From 2011 to 2013, 198 patients were referred for ER, and 73 underwent primary SR (70% laparoscopic). There was a lower AE rate for ER versus primary SR (10% vs 18%; P = .09). ER costs (including rescue SR, when required) were lower than those of primary SR ($2152 vs $15,264; P < .0001). CONCLUSIONS: AEs occur in approximately one-sixth of patients after SR for complex polyps. ER-accounting for rescue SR caused by malignancy, AEs, or incomplete resection-is associated with markedly lower costs than SR. These data should be used when counseling patients about treatment options for complex polyps.


Assuntos
Adenoma/cirurgia , Colectomia , Pólipos do Colo/cirurgia , Colonoscopia , Neoplasias Colorretais/cirurgia , Ressecção Endoscópica de Mucosa , Custos de Cuidados de Saúde , Tempo de Internação/estatística & dados numéricos , Complicações Pós-Operatórias/epidemiologia , Idoso , Feminino , Humanos , Tempo de Internação/economia , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Complicações Pós-Operatórias/economia , Estudos Retrospectivos , Estados Unidos
17.
BMC Infect Dis ; 16(1): 684, 2016 11 17.
Artigo em Inglês | MEDLINE | ID: mdl-27855652

RESUMO

BACKGROUND: Community associated methicillin-resistant Staphylococcus aureus (CA-MRSA) is one of the most common causes of skin and soft tissue infections in the United States, and a variety of genetic host factors are suspected to be risk factors for recurrent infection. Based on the CDC definition, we have developed and validated an electronic health record (EHR) based CA-MRSA phenotype algorithm utilizing both structured and unstructured data. METHODS: The algorithm was validated at three eMERGE consortium sites, and positive predictive value, negative predictive value and sensitivity, were calculated. The algorithm was then run and data collected across seven total sites. The resulting data was used in GWAS analysis. RESULTS: Across seven sites, the CA-MRSA phenotype algorithm identified a total of 349 cases and 7761 controls among the genotyped European and African American biobank populations. PPV ranged from 68 to 100% for cases and 96 to 100% for controls; sensitivity ranged from 94 to 100% for cases and 75 to 100% for controls. Frequency of cases in the populations varied widely by site. There were no plausible GWAS-significant (p < 5 E -8) findings. CONCLUSIONS: Differences in EHR data representation and screening patterns across sites may have affected identification of cases and controls and accounted for varying frequencies across sites. Future work identifying these patterns is necessary.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla/métodos , Staphylococcus aureus Resistente à Meticilina , Fenótipo , Infecções Estafilocócicas/diagnóstico , Adulto , Estudos de Casos e Controles , Infecções Comunitárias Adquiridas/diagnóstico , Infecções Comunitárias Adquiridas/genética , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Fatores de Risco , Sensibilidade e Especificidade , Infecções Estafilocócicas/genética , Estados Unidos
18.
J Biomed Inform ; 62: 232-42, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-27392645

RESUMO

The Quality Data Model (QDM) is an information model developed by the National Quality Forum for representing electronic health record (EHR)-based electronic clinical quality measures (eCQMs). In conjunction with the HL7 Health Quality Measures Format (HQMF), QDM contains core elements that make it a promising model for representing EHR-driven phenotype algorithms for clinical research. However, the current QDM specification is available only as descriptive documents suitable for human readability and interpretation, but not for machine consumption. The objective of the present study is to develop and evaluate a data element repository (DER) for providing machine-readable QDM data element service APIs to support phenotype algorithm authoring and execution. We used the ISO/IEC 11179 metadata standard to capture the structure for each data element, and leverage Semantic Web technologies to facilitate semantic representation of these metadata. We observed there are a number of underspecified areas in the QDM, including the lack of model constraints and pre-defined value sets. We propose a harmonization with the models developed in HL7 Fast Healthcare Interoperability Resources (FHIR) and Clinical Information Modeling Initiatives (CIMI) to enhance the QDM specification and enable the extensibility and better coverage of the DER. We also compared the DER with the existing QDM implementation utilized within the Measure Authoring Tool (MAT) to demonstrate the scalability and extensibility of our DER-based approach.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Fenótipo , Pesquisa Biomédica , Bases de Dados Factuais , Humanos , Semântica
19.
Am J Epidemiol ; 182(3): 235-43, 2015 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-26093003

RESUMO

We used electronic health record data from 162 patients enrolled in the NUgene Project (2002-2013) to determine demographic factors associated with long-term (from 1 to up to 9.5 (mean = 5.6) years) weight loss following Roux-en-Y gastric bypass surgery. Ninety-nine (61.1%) patients self-reported white, and 63 (38.9%) self-reported black, mixed, or missing race. The average percent weight loss was -33.4% (standard deviation, 9.3) at 1 year after surgery and -30.7% (standard deviation, 12.5) at the last follow-up point. We used linear mixed and semiparametric trajectory models to test the association of surgical and demographic factors (height, surgery age, surgery weight, surgery body mass index, marital status, sex, educational level, site, International Classification of Diseases code, Current Procedural Terminology code, Hispanic ethnicity, and self-reported race) with long-term percent weight loss and pattern of weight loss. We found that black, mixed, and missing races (combined) in comparison with white race were associated with a decreased percent weight loss of -4.31% (95% confidence interval: -7.30, -1.32) and were less likely to have higher and sustained percent weight loss (P = 0.04). We also found that less obese patients were less likely to have higher and sustained percent weight loss (P = 0.01). These findings may be helpful to patients in setting expectations after weight loss surgery.


Assuntos
Derivação Gástrica/estatística & dados numéricos , Modelos Lineares , Obesidade Mórbida/cirurgia , Redução de Peso/etnologia , Análise de Variância , Feminino , Seguimentos , Humanos , Masculino , Pessoa de Meia-Idade , Autorrelato , Tempo , Resultado do Tratamento
20.
Hum Genet ; 133(1): 95-109, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24026423

RESUMO

Platelets are enucleated cell fragments derived from megakaryocytes that play key roles in hemostasis and in the pathogenesis of atherothrombosis and cancer. Platelet traits are highly heritable and identification of genetic variants associated with platelet traits and assessing their pleiotropic effects may help to understand the role of underlying biological pathways. We conducted an electronic medical record (EMR)-based study to identify common variants that influence inter-individual variation in the number of circulating platelets (PLT) and mean platelet volume (MPV), by performing a genome-wide association study (GWAS). We characterized genetic variants associated with MPV and PLT using functional, pathway and disease enrichment analyses; we assessed pleiotropic effects of such variants by performing a phenome-wide association study (PheWAS) with a wide range of EMR-derived phenotypes. A total of 13,582 participants in the electronic MEdical Records and GEnomic network had data for PLT and 6,291 participants had data for MPV. We identified five chromosomal regions associated with PLT and eight associated with MPV at genome-wide significance (P < 5E-8). In addition, we replicated 20 SNPs [out of 56 SNPs (α: 0.05/56 = 9E-4)] influencing PLT and 22 SNPs [out of 29 SNPs (α: 0.05/29 = 2E-3)] influencing MPV in a published meta-analysis of GWAS of PLT and MPV. While our GWAS did not find any new associations, our functional analyses revealed that genes in these regions influence thrombopoiesis and encode kinases, membrane proteins, proteins involved in cellular trafficking, transcription factors, proteasome complex subunits, proteins of signal transduction pathways, proteins involved in megakaryocyte development, and platelet production and hemostasis. PheWAS using a single-SNP Bonferroni correction for 1,368 diagnoses (0.05/1368 = 3.6E-5) revealed that several variants in these genes have pleiotropic associations with myocardial infarction, autoimmune, and hematologic disorders. We conclude that multiple genetic loci influence interindividual variation in platelet traits and also have significant pleiotropic effects; the related genes are in multiple functional pathways including those relevant to thrombopoiesis.


Assuntos
Pleiotropia Genética , Estudo de Associação Genômica Ampla/métodos , Volume Plaquetário Médio , Contagem de Plaquetas , Polimorfismo de Nucleotídeo Único , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças Cardiovasculares/genética , Cromossomos Humanos/genética , Feminino , Loci Gênicos , Hemostasia , Humanos , Masculino , Metanálise como Assunto , Pessoa de Meia-Idade , Fenótipo , Trombopoese/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA