Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Circulation ; 138(22): 2469-2481, 2018 11 27.
Artículo en Inglés | MEDLINE | ID: mdl-30571344

RESUMEN

BACKGROUND: Proteomic approaches allow measurement of thousands of proteins in a single specimen, which can accelerate biomarker discovery. However, applying these technologies to massive biobanks is not currently feasible because of the practical barriers and costs of implementing such assays at scale. To overcome these challenges, we used a "virtual proteomic" approach, linking genetically predicted protein levels to clinical diagnoses in >40 000 individuals. METHODS: We used genome-wide association data from the Framingham Heart Study (n=759) to construct genetic predictors for 1129 plasma protein levels. We validated the genetic predictors for 268 proteins and used them to compute predicted protein levels in 41 288 genotyped individuals in the Electronic Medical Records and Genomics (eMERGE) cohort. We tested associations for each predicted protein with 1128 clinical phenotypes. Lead associations were validated with directly measured protein levels and either low-density lipoprotein cholesterol or subclinical atherosclerosis in the MDCS (Malmö Diet and Cancer Study; n=651). RESULTS: In the virtual proteomic analysis in eMERGE, 55 proteins were associated with 89 distinct diagnoses at a false discovery rate q<0.1. Among these, 13 associations involved lipid (n=7) or atherosclerosis (n=6) phenotypes. We tested each association for validation in MDCS using directly measured protein levels. At Bonferroni-adjusted significance thresholds, levels of apolipoprotein E isoforms were associated with hyperlipidemia, and circulating C-type lectin domain family 1 member B and platelet-derived growth factor receptor-ß predicted subclinical atherosclerosis. Odds ratios for carotid atherosclerosis were 1.31 (95% CI, 1.08-1.58; P=0.006) per 1-SD increment in C-type lectin domain family 1 member B and 0.79 (0.66-0.94; P=0.008) per 1-SD increment in platelet-derived growth factor receptor-ß. CONCLUSIONS: We demonstrate a biomarker discovery paradigm to identify candidate biomarkers of cardiovascular and other diseases.


Asunto(s)
Biomarcadores/sangre , Enfermedades de las Arterias Carótidas/diagnóstico , Estudio de Asociación del Genoma Completo , Proteoma/análisis , Adulto , Anciano , Anciano de 80 o más Años , Enfermedades de las Arterias Carótidas/genética , Femenino , Genotipo , Humanos , Lectinas Tipo C/análisis , Masculino , Persona de Mediana Edad , Oportunidad Relativa , Fenotipo , Polimorfismo de Nucleótido Simple , Proteómica , Receptor beta de Factor de Crecimiento Derivado de Plaquetas/sangre
2.
Circ Res ; 120(2): 341-353, 2017 Jan 20.
Artículo en Inglés | MEDLINE | ID: mdl-27899403

RESUMEN

RATIONALE: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA. OBJECTIVE: To identify additional AAA risk loci using data from all available genome-wide association studies. METHODS AND RESULTS: Through a meta-analysis of 6 genome-wide association study data sets and a validation study totaling 10 204 cases and 107 766 controls, we identified 4 new AAA risk loci: 1q32.3 (SMYD2), 13q12.11 (LINC00540), 20q13.12 (near PCIF1/MMP9/ZNF335), and 21q22.2 (ERG). In various database searches, we observed no new associations between the lead AAA single nucleotide polymorphisms and coronary artery disease, blood pressure, lipids, or diabetes mellitus. Network analyses identified ERG, IL6R, and LDLR as modifiers of MMP9, with a direct interaction between ERG and MMP9. CONCLUSIONS: The 4 new risk loci for AAA seem to be specific for AAA compared with other cardiovascular diseases and related traits suggesting that traditional cardiovascular risk factor management may only have limited value in preventing the progression of aneurysmal disease.


Asunto(s)
Aneurisma de la Aorta Abdominal/diagnóstico , Aneurisma de la Aorta Abdominal/genética , Sitios Genéticos/genética , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Aneurisma de la Aorta Abdominal/epidemiología , Predisposición Genética a la Enfermedad/epidemiología , Variación Genética/genética , Estudio de Asociación del Genoma Completo/tendencias , Humanos
3.
BMC Infect Dis ; 16(1): 684, 2016 11 17.
Artículo en Inglés | MEDLINE | ID: mdl-27855652

RESUMEN

BACKGROUND: Community associated methicillin-resistant Staphylococcus aureus (CA-MRSA) is one of the most common causes of skin and soft tissue infections in the United States, and a variety of genetic host factors are suspected to be risk factors for recurrent infection. Based on the CDC definition, we have developed and validated an electronic health record (EHR) based CA-MRSA phenotype algorithm utilizing both structured and unstructured data. METHODS: The algorithm was validated at three eMERGE consortium sites, and positive predictive value, negative predictive value and sensitivity, were calculated. The algorithm was then run and data collected across seven total sites. The resulting data was used in GWAS analysis. RESULTS: Across seven sites, the CA-MRSA phenotype algorithm identified a total of 349 cases and 7761 controls among the genotyped European and African American biobank populations. PPV ranged from 68 to 100% for cases and 96 to 100% for controls; sensitivity ranged from 94 to 100% for cases and 75 to 100% for controls. Frequency of cases in the populations varied widely by site. There were no plausible GWAS-significant (p < 5 E -8) findings. CONCLUSIONS: Differences in EHR data representation and screening patterns across sites may have affected identification of cases and controls and accounted for varying frequencies across sites. Future work identifying these patterns is necessary.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo/métodos , Staphylococcus aureus Resistente a Meticilina , Fenotipo , Infecciones Estafilocócicas/diagnóstico , Adulto , Estudios de Casos y Controles , Infecciones Comunitarias Adquiridas/diagnóstico , Infecciones Comunitarias Adquiridas/genética , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Factores de Riesgo , Sensibilidad y Especificidad , Infecciones Estafilocócicas/genética , Estados Unidos
4.
PLoS One ; 18(5): e0283553, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37196047

RESUMEN

OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. MATERIALS AND METHODS: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. RESULTS: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. DISCUSSION: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. CONCLUSION: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.


Asunto(s)
Enfermedades Diverticulares , Diverticulitis , Divertículo , Humanos , Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo/métodos , Procesamiento de Lenguaje Natural , Fenotipo , Algoritmos , Polimorfismo de Nucleótido Simple
5.
J Clin Sleep Med ; 16(2): 175-183, 2020 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-31992429

RESUMEN

STUDY OBJECTIVES: We examined the performance of a simple algorithm to accurately distinguish cases of diagnosed obstructive sleep apnea (OSA) and noncases using the electronic health record (EHR) across six health systems in the United States. METHODS: Retrospective analysis of EHR data was performed. The algorithm defined cases as individuals with ≥ 2 instances of specific International Classification of Diseases (ICD)-9 and/or ICD-10 diagnostic codes (327.20, 327.23, 327.29, 780.51, 780.53, 780.57, G4730, G4733 and G4739) related to sleep apnea on separate dates in their EHR. Noncases were defined by the absence of these codes. Using chart reviews on 120 cases and 100 noncases at each site (n = 1,320 total), positive predictive value (PPV) and negative predictive value (NPV) were calculated. RESULTS: The algorithm showed excellent performance across sites, with a PPV (95% confidence interval) of 97.1 (95.6, 98.2) and NPV of 95.5 (93.5, 97.0). Similar performance was seen at each site, with all NPV and PPV estimates ≥ 90% apart from a somewhat lower PPV of 87.5 (80.2, 92.8) at one site. A modified algorithm of ≥ 3 instances improved PPV to 94.9 (88.5, 98.3) at this site, but excluded an additional 18.3% of cases. Thus, performance may be further improved by requiring additional codes, but this reduces the number of determinate cases. CONCLUSIONS: A simple EHR-based case-identification algorithm for diagnosed OSA showed excellent predictive characteristics in a multisite sample from the United States. Future analyses should be performed to understand the effect of undiagnosed disease in EHR-defined noncases. This algorithm has wide-ranging applications for EHR-based OSA research.


Asunto(s)
Registros Electrónicos de Salud , Apnea Obstructiva del Sueño , Algoritmos , Humanos , Clasificación Internacional de Enfermedades , Estudios Retrospectivos , Apnea Obstructiva del Sueño/diagnóstico
6.
Sci Rep ; 9(1): 6077, 2019 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-30988330

RESUMEN

Benign prostatic hyperplasia (BPH) results in a significant public health burden due to the morbidity caused by the disease and many of the available remedies. As much as 70% of men over 70 will develop BPH. Few studies have been conducted to discover the genetic determinants of BPH risk. Understanding the biological basis for this condition may provide necessary insight for development of novel pharmaceutical therapies or risk prediction. We have evaluated SNP-based heritability of BPH in two cohorts and conducted a genome-wide association study (GWAS) of BPH risk using 2,656 cases and 7,763 controls identified from the Electronic Medical Records and Genomics (eMERGE) network. SNP-based heritability estimates suggest that roughly 60% of the phenotypic variation in BPH is accounted for by genetic factors. We used logistic regression to model BPH risk as a function of principal components of ancestry, age, and imputed genotype data, with meta-analysis performed using METAL. The top result was on chromosome 22 in SYN3 at rs2710383 (p-value = 4.6 × 10-7; Odds Ratio = 0.69, 95% confidence interval = 0.55-0.83). Other suggestive signals were near genes GLGC, UNCA13, SORCS1 and between BTBD3 and SPTLC3. We also evaluated genetically-predicted gene expression in prostate tissue. The most significant result was with increasing predicted expression of ETV4 (chr17; p-value = 0.0015). Overexpression of this gene has been associated with poor prognosis in prostate cancer. In conclusion, although there were no genome-wide significant variants identified for BPH susceptibility, we present evidence supporting the heritability of this phenotype, have identified suggestive signals, and evaluated the association between BPH and genetically-predicted gene expression in prostate.


Asunto(s)
Predisposición Genética a la Enfermedad , Patrón de Herencia , Hiperplasia Prostática/genética , Anciano , Anciano de 80 o más Años , Biomarcadores/metabolismo , Estudios de Casos y Controles , Registros Electrónicos de Salud/estadística & datos numéricos , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Técnicas de Genotipaje , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Próstata/patología , Hiperplasia Prostática/epidemiología , Hiperplasia Prostática/patología
7.
NPJ Genom Med ; 4: 3, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30774981

RESUMEN

We conducted an electronic health record (EHR)-based phenome-wide association study (PheWAS) to discover pleiotropic effects of variants in three lipoprotein metabolism genes PCSK9, APOB, and LDLR. Using high-density genotype data, we tested the associations of variants in the three genes with 1232 EHR-derived binary phecodes in 51,700 European-ancestry (EA) individuals and 585 phecodes in 10,276 African-ancestry (AA) individuals; 457 PCSK9, 730 APOB, and 720 LDLR variants were filtered by imputation quality (r 2 > 0.4), minor allele frequency (>1%), linkage disequilibrium (r 2 < 0.3), and association with LDL-C levels, yielding a set of two PCSK9, three APOB, and five LDLR variants in EA but no variants in AA. Cases and controls were defined for each phecode using the PheWAS package in R. Logistic regression assuming an additive genetic model was used with adjustment for age, sex, and the first two principal components. Significant associations were tested in additional cohorts from Vanderbilt University (n = 29,713), the Marshfield Clinic Personalized Medicine Research Project (n = 9562), and UK Biobank (n = 408,455). We identified one PCSK9, two APOB, and two LDLR variants significantly associated with an examined phecode. Only one of the variants was associated with a non-lipid disease phecode, ("myopia") but this association was not significant in the replication cohorts. In this large-scale PheWAS we did not find LDL-C-related variants in PCSK9, APOB, and LDLR to be associated with non-lipid-related phenotypes including diabetes, neurocognitive disorders, or cataracts.

8.
Nat Commun ; 9(1): 3522, 2018 08 30.
Artículo en Inglés | MEDLINE | ID: mdl-30166544

RESUMEN

Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice. We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian sparse linear mixed modeling to calculate SNP weightings for 53 biomarkers from the Atherosclerosis Risk in Communities study. We use the SNP weightings to computed predicted biomarker values in an EHR population and test associations with 1139 diagnoses. Here we report 116 associations meeting a Bonferroni level of significance. A false discovery rate (FDR)-based significance threshold reveals more known and undescribed associations across a broad range of biomarkers, including biometric measures, plasma proteins and metabolites, functional assays, and behaviors. We confirm an inverse association between LDL-cholesterol level and septicemia risk in an independent epidemiological cohort. This approach efficiently discovers biomarker-disease associations.


Asunto(s)
Biomarcadores/análisis , Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo/métodos , Teorema de Bayes , Biomarcadores/sangre , LDL-Colesterol/sangre , Humanos , Estudios Prospectivos , Factores de Riesgo
9.
PLoS One ; 10(9): e0138677, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26413716

RESUMEN

INTRODUCTION: Liver enzyme levels and total serum bilirubin are under genetic control and in recent years genome-wide population-based association studies have identified different susceptibility loci for these traits. We conducted a genome-wide association study in European ancestry participants from the Electronic Medical Records and Genomics (eMERGE) Network dataset of patient medical records with available genotyping data in order to identify genetic contributors to variability in serum bilirubin levels and other liver function tests and to compare the effects between adult and pediatric populations. METHODS: The process of whole genome imputation of eMERGE samples with standard quality control measures have been described previously. After removing missing data and outliers based on principal components (PC) analyses, 3294 samples from European ancestry were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and total serum bilirubin and other liver function tests was tested using linear regression, adjusting for age, gender, site, platform and ancestry principal components (PC). RESULTS: Consistent with previous results, a strong association signal has been detected for UGT1A gene cluster (best SNP rs887829, beta = 0.15, p = 1.30x10-118) for total serum bilirubin level. Indeed, in this region more than 176 SNPs (or indels) had p<10-8 spanning 150Kb on the long arm of chromosome 2q37.1. In addition, we found a similar level of magnitude in a pediatric group (p = 8.26x10-47, beta = 0.17). Further imputation using sequencing data as a reference panel revealed association of other markers including known TA7 repeat indels (rs8175347) (p = 9.78x10-117) and rs111741722 (p = 5.41x10-119) which were in proxy (r2 = 0.99) with rs887829. Among rare variants, two Asian subjects homozygous for coding SNP rs4148323 (G71R) were identified. Additional known effects for total serum bilirubin were also confirmed including organic anion transporters SLCO1B1-SLCO1B3, TDRP and ZMYND8 at FDR<0.05 with no gene-gene interaction effects. Phenome-wide association studies (PheWAS) suggest a protective effect of TA7 repeat against cerebrovascular disease in an adult cohort (OR = 0.75, p = 0.0008). Among other liver function tests, we also confirmed the previous effect of the ABO blood group locus for variation in serum alkaline phosphatase (rs579459, p = 9.44x10-15). CONCLUSIONS: Taken together, our data present interesting findings with strong confirmation of previous effects by simply using the eMERGE electronic health record phenotyping. In addition, our findings indicate that similar to the adult population, the UGT1A1 is the main locus responsible for normal variation of serum bilirubin in pediatric populations.


Asunto(s)
Registros Electrónicos de Salud , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Genómica , Pruebas de Función Hepática , Adulto , Fosfatasa Alcalina/sangre , Bilirrubina/sangre , Estudios de Casos y Controles , Niño , Estudios de Cohortes , Demografía , Femenino , Glucuronosiltransferasa/genética , Humanos , Desequilibrio de Ligamiento/genética , Masculino , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética
10.
Int J Biomed Data Min ; 4(1)2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-27054044

RESUMEN

BACKGROUND AND OBJECTIVE: We designed an algorithm to identify abdominal aortic aneurysm cases and controls from electronic health records to be shared and executed within the "electronic Medical Records and Genomics" (eMERGE) Network. MATERIALS AND METHODS: Structured Query Language, was used to script the algorithm utilizing "Current Procedural Terminology" and "International Classification of Diseases" codes, with demographic and encounter data to classify individuals as case, control, or excluded. The algorithm was validated using blinded manual chart review at three eMERGE Network sites and one non-eMERGE Network site. Validation comprised evaluation of an equal number of predicted cases and controls selected at random from the algorithm predictions. After validation at the three eMERGE Network sites, the remaining eMERGE Network sites performed verification only. Finally, the algorithm was implemented as a workflow in the Konstanz Information Miner, which represented the logic graphically while retaining intermediate data for inspection at each node. The algorithm was configured to be independent of specific access to data and was exportable (without data) to other sites. RESULTS: The algorithm demonstrated positive predictive values (PPV) of 92.8% (CI: 86.8-96.7) and 100% (CI: 97.0-100) for cases and controls, respectively. It performed well also outside the eMERGE Network. Implementation of the transportable executable algorithm as a Konstanz Information Miner workflow required much less effort than implementation from pseudo code, and ensured that the logic was as intended. DISCUSSION AND CONCLUSION: This ePhenotyping algorithm identifies abdominal aortic aneurysm cases and controls from the electronic health record with high case and control PPV necessary for research purposes, can be disseminated easily, and applied to high-throughput genetic and other studies.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA