Pesquisa | Prevenção e Controle de Câncer

Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm.

Joo, Yoonjung Yoonie; Pacheco, Jennifer A; Thompson, William K; Rasmussen-Torvik, Laura J; Rasmussen, Luke V; Lin, Frederick T J; Andrade, Mariza de; Borthwick, Kenneth M; Bottinger, Erwin; Cagan, Andrew; Carrell, David S; Denny, Joshua C; Ellis, Stephen B; Gottesman, Omri; Linneman, James G; Pathak, Jyotishman; Peissig, Peggy L; Shang, Ning; Tromp, Gerard; Veerappan, Annapoorani; Smith, Maureen E; Chisholm, Rex L; Gawron, Andrew J; Hayes, M Geoffrey; Kho, Abel N.

PLoS One ; 18(5): e0283553, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37196047

RESUMO

OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. MATERIALS AND METHODS: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. RESULTS: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. DISCUSSION: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. CONCLUSION: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.

Assuntos

Doenças Diverticulares , Diverticulite , Divertículo , Humanos , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla/métodos , Processamento de Linguagem Natural , Fenótipo , Algoritmos , Polimorfismo de Nucleotídeo Único

Heritability and genome-wide association study of benign prostatic hyperplasia (BPH) in the eMERGE network.

Hellwege, Jacklyn N; Stallings, Sarah; Torstenson, Eric S; Carroll, Robert; Borthwick, Kenneth M; Brilliant, Murray H; Crosslin, David; Gordon, Adam; Hripcsak, George; Jarvik, Gail P; Linneman, James G; Devi, Parimala; Peissig, Peggy L; Sleiman, Patrick A M; Hakonarson, Hakon; Ritchie, Marylyn D; Verma, Shefali Setia; Shang, Ning; Denny, Josh C; Roden, Dan M; Velez Edwards, Digna R; Edwards, Todd L.

Sci Rep ; 9(1): 6077, 2019 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-30988330

RESUMO

Benign prostatic hyperplasia (BPH) results in a significant public health burden due to the morbidity caused by the disease and many of the available remedies. As much as 70% of men over 70 will develop BPH. Few studies have been conducted to discover the genetic determinants of BPH risk. Understanding the biological basis for this condition may provide necessary insight for development of novel pharmaceutical therapies or risk prediction. We have evaluated SNP-based heritability of BPH in two cohorts and conducted a genome-wide association study (GWAS) of BPH risk using 2,656 cases and 7,763 controls identified from the Electronic Medical Records and Genomics (eMERGE) network. SNP-based heritability estimates suggest that roughly 60% of the phenotypic variation in BPH is accounted for by genetic factors. We used logistic regression to model BPH risk as a function of principal components of ancestry, age, and imputed genotype data, with meta-analysis performed using METAL. The top result was on chromosome 22 in SYN3 at rs2710383 (p-value = 4.6 × 10-7; Odds Ratio = 0.69, 95% confidence interval = 0.55-0.83). Other suggestive signals were near genes GLGC, UNCA13, SORCS1 and between BTBD3 and SPTLC3. We also evaluated genetically-predicted gene expression in prostate tissue. The most significant result was with increasing predicted expression of ETV4 (chr17; p-value = 0.0015). Overexpression of this gene has been associated with poor prognosis in prostate cancer. In conclusion, although there were no genome-wide significant variants identified for BPH susceptibility, we present evidence supporting the heritability of this phenotype, have identified suggestive signals, and evaluated the association between BPH and genetically-predicted gene expression in prostate.

Assuntos

Predisposição Genética para Doença , Padrões de Herança , Hiperplasia Prostática/genética , Idoso , Idoso de 80 Anos ou mais , Biomarcadores/metabolismo , Estudos de Casos e Controles , Registros Eletrônicos de Saúde/estatística & dados numéricos , Perfilação da Expressão Gênica , Estudo de Associação Genômica Ampla , Técnicas de Genotipagem , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Próstata/patologia , Hiperplasia Prostática/epidemiologia , Hiperplasia Prostática/patologia

Probing the Virtual Proteome to Identify Novel Disease Biomarkers.

Mosley, Jonathan D; Benson, Mark D; Smith, J Gustav; Melander, Olle; Ngo, Debby; Shaffer, Christian M; Ferguson, Jane F; Herzig, Matthew S; McCarty, Catherine A; Chute, Christopher G; Jarvik, Gail P; Gordon, Adam S; Palmer, Melody R; Crosslin, David R; Larson, Eric B; Carrell, David S; Kullo, Iftikhar J; Pacheco, Jennifer A; Peissig, Peggy L; Brilliant, Murray H; Kitchner, Terrie E; Linneman, James G; Namjou, Bahram; Williams, Marc S; Ritchie, Marylyn D; Borthwick, Kenneth M; Kiryluk, Krzysztof; Mentch, Frank D; Sleiman, Patrick M; Karlson, Elizabeth W; Verma, Shefali S; Zhu, Yineng; Vasan, Ramachandran S; Yang, Qiong; Denny, Josh C; Roden, Dan M; Gerszten, Robert E; Wang, Thomas J.

Circulation ; 138(22): 2469-2481, 2018 11 27.

Artigo em Inglês | MEDLINE | ID: mdl-30571344

RESUMO

BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-ß predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-ß. CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.

Assuntos

Biomarcadores/sangue , Doenças das Artérias Carótidas/diagnóstico , Estudo de Associação Genômica Ampla , Proteoma/análise , Adulto , Idoso , Idoso de 80 Anos ou mais , Doenças das Artérias Carótidas/genética , Feminino , Genótipo , Humanos , Lectinas Tipo C/análise , Masculino , Pessoa de Meia-Idade , Razão de Chances , Fenótipo , Polimorfismo de Nucleotídeo Único , Proteômica , Receptor beta de Fator de Crescimento Derivado de Plaquetas/sangue

A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments.

Pacheco, Jennifer A; Rasmussen, Luke V; Kiefer, Richard C; Campion, Thomas R; Speltz, Peter; Carroll, Robert J; Stallings, Sarah C; Mo, Huan; Ahuja, Monika; Jiang, Guoqian; LaRose, Eric R; Peissig, Peggy L; Shang, Ning; Benoit, Barbara; Gainer, Vivian S; Borthwick, Kenneth; Jackson, Kathryn L; Sharma, Ambrish; Wu, Andy Yizhou; Kho, Abel N; Roden, Dan M; Pathak, Jyotishman; Denny, Joshua C; Thompson, William K.

J Am Med Inform Assoc ; 25(11): 1540-1546, 2018 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-30124903

RESUMO

Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With PhEMA, we converted an algorithm for benign prostatic hyperplasia, developed for the electronic Medical Records and Genomics network (eMERGE), into a standards-based computable format. Eight sites (7 within eMERGE) received the computable algorithm, and 6 successfully executed it against local data warehouses and/or i2b2 instances. Blinded random chart review of cases selected by the computable algorithm shows PPV ≥90%, and 3 out of 5 sites had >90% overlap of selected cases when comparing the computable algorithm to their original eMERGE implementation. This case study demonstrates potential use of PhEMA computable representations to automate phenotyping across different EHR systems, but also highlights some ongoing challenges.

Assuntos

Algoritmos , Registros Eletrônicos de Saúde , Fenótipo , Hiperplasia Prostática/diagnóstico , Data Warehousing , Bases de Dados Factuais , Genômica , Humanos , Masculino , Estudos de Casos Organizacionais , Hiperplasia Prostática/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA