RESUMEN
BACKGROUND: ECG QRS duration, a measure of cardiac intraventricular conduction, varies ≈2-fold in individuals without cardiac disease. Slow conduction may promote re-entrant arrhythmias. METHODS AND RESULTS: We performed a genome-wide association study to identify genomic markers of QRS duration in 5272 individuals without cardiac disease selected from electronic medical record algorithms at 5 sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium QRS genome-wide association study meta-analysis. Twenty-three single-nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 single-nucleotide polymorphisms were in the chromosome 3 SCN5A and SCN10A loci, where the most significant single-nucleotide polymorphisms were rs1805126 in SCN5A with P=1.2×10(-8) (eMERGE) and P=2.5×10(-20) (CHARGE) and rs6795970 in SCN10A with P=6×10(-6) (eMERGE) and P=5×10(-27) (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies on variants in these 5 loci in 13859 European Americans to search for diagnoses associated with these markers. Phenome-wide association study identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5272 "heart-healthy" study population. CONCLUSIONS: We conclude that DNA biobanks coupled to electronic medical records not only provide a platform for genome-wide association study but also may allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The phenome-wide association study approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.
Asunto(s)
Arritmias Cardíacas/diagnóstico , Arritmias Cardíacas/genética , Marcadores Genéticos/genética , Estudio de Asociación del Genoma Completo/métodos , Sistema de Conducción Cardíaco/fisiopatología , Frecuencia Cardíaca/genética , Adulto , Anciano , Anciano de 80 o más Años , Arritmias Cardíacas/epidemiología , Femenino , Sistema de Conducción Cardíaco/metabolismo , Humanos , Masculino , Persona de Mediana Edad , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Factores de RiesgoRESUMEN
We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10(-9)). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10(-6)). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10(-5)), nodular (OR = 0.76, p = 3.1 × 10(-5)) and multinodular (OR = 0.69, p = 3.9 × 10(-5)) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10(-3)), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10(-13)), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.
Asunto(s)
Factores de Transcripción Forkhead/genética , Hipotiroidismo/genética , Anciano , Algoritmos , Femenino , Marcadores Genéticos , Variación Genética , Genoma , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Sistemas de Registros Médicos Computarizados , Persona de Mediana Edad , Fenotipo , Valor Predictivo de las PruebasRESUMEN
Electrocardiographic (ECG) measurements vary by ancestry. Genome-wide association studies (GWAS) have identified loci that contribute to ECG measurements; however, most are performed in Europeans collected from population-based cohorts or surveys. The strongest associations reported are in NOS1AP with QT interval and SCN10A with PR and QRS durations. The extent to which these associations can be generalized to African Americans has yet to be determined. Using electronic medical records, PR and QT intervals, QRS duration, and heart rate were determined in 455 African Americans as part of the Vanderbilt Genome-Electronic Records Project and Northwestern University NUgene Project. We tested for an association between these ECG traits and >930K SNPs. We identified a total 36 novel associations with PR interval, QRS duration, QT interval, and heart rate at p < 1.0 × 10(-6). Using published GWAS data, we compared our results with those previously identified in other populations. Five associations originally identified in other populations generalized with respect to statistical significance and direction of effect. A total of 43 associations have a consistent direction of effect with European and/or Asian populations. This work provides a catalogue of generalized versus nongeneralized associations, a necessary step in prioritizing GWAS-identified regions for further fine-mapping in diverse populations.
Asunto(s)
Negro o Afroamericano/genética , Electrocardiografía , Variación Genética , Estudio de Asociación del Genoma Completo , Carácter Cuantitativo Heredable , Adulto , Alelos , Mapeo Cromosómico , Etnicidad/genética , Femenino , Frecuencia de los Genes , Estudios de Asociación Genética , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Población Blanca/genéticaRESUMEN
Large-scale DNA databanks linked to electronic medical record (EMR) systems have been proposed as an approach for rapidly generating large, diverse cohorts for discovery and replication of genotype-phenotype associations. However, the extent to which such resources are capable of delivering on this promise is unknown. We studied whether an EMR-linked DNA biorepository can be used to detect known genotype-phenotype associations for five diseases. Twenty-one SNPs previously implicated as common variants predisposing to atrial fibrillation, Crohn disease, multiple sclerosis, rheumatoid arthritis, or type 2 diabetes were successfully genotyped in 9483 samples accrued over 4 mo into BioVU, the Vanderbilt University Medical Center DNA biobank. Previously reported odds ratios (OR(PR)) ranged from 1.14 to 2.36. For each phenotype, natural language processing techniques and billing-code queries were used to identify cases (n = 70-698) and controls (n = 808-3818) from deidentified health records. Each of the 21 tests of association yielded point estimates in the expected direction. Previous genotype-phenotype associations were replicated (p < 0.05) in 8/14 cases when the OR(PR) was > 1.25, and in 0/7 with lower OR(PR). Statistically significant associations were detected in all analyses that were adequately powered. In each of the five diseases studied, at least one previously reported association was replicated. These data demonstrate that phenotypes representing clinical diagnoses can be extracted from EMR systems, and they support the use of DNA resources coupled to EMR systems as tools for rapid generation of large data sets required for replication of associations found in research cohorts and for discovery in genome science.
Asunto(s)
Artritis Reumatoide/genética , Fibrilación Atrial/genética , Enfermedad de Crohn/genética , Diabetes Mellitus Tipo 2/genética , Registros Electrónicos de Salud , Estudios de Asociación Genética/tendencias , Esclerosis Múltiple/genética , Estudios de Casos y Controles , ADN/sangre , ADN/genética , Genoma Humano , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
The Electronic Medical Records and Genomics Network is a National Human Genome Research Institute-funded consortium engaged in the development of methods and best practices for using the electronic medical record as a tool for genomic research. Now in its sixth year and second funding cycle, and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from electronic medical records can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and health-care informatics, particularly for electronic phenotyping, genome-wide association studies, genomic medicine implementation, and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here, we describe the evolution, accomplishments, opportunities, and challenges of the network from its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting toward the implementation of genomic medicine.
Asunto(s)
Registros Electrónicos de Salud , Investigación Genética , Genómica , Registros Electrónicos de Salud/tendencias , Investigación Genética/ética , Estudio de Asociación del Genoma Completo , Genómica/ética , Genómica/tendencias , Genotipo , Humanos , National Human Genome Research Institute (U.S.) , Fenotipo , Medicina de Precisión , Estados UnidosRESUMEN
Recently, large scale genomic projects such as All of Us and the UK Biobank have introduced a new research paradigm where data are stored centrally in cloud-based Trusted Research Environments (TREs). To characterize the advantages and drawbacks of different TRE attributes in facilitating cross-cohort analysis, we conduct a Genome-Wide Association Study of standard lipid measures using two approaches: meta-analysis and pooled analysis. Comparison of full summary data from both approaches with an external study shows strong correlation of known loci with lipid levels (R2 ~ 83-97%). Importantly, 90 variants meet the significance threshold only in the meta-analysis and 64 variants are significant only in pooled analysis, with approximately 20% of variants in each of those groups being most prevalent in non-European, non-Asian ancestry individuals. These findings have important implications, as technical and policy choices lead to cross-cohort analyses generating similar, but not identical results, particularly for non-European ancestral populations.
Asunto(s)
Estudio de Asociación del Genoma Completo , Salud Poblacional , Humanos , Genómica , Políticas , LípidosRESUMEN
The All of Us Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in All of Us, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the All of Us data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.
Asunto(s)
Investigación Biomédica , Salud Poblacional , Humanos , Ecosistema , Medicina de PrecisiónRESUMEN
BACKGROUND: Recent genome-wide association studies in which selected community populations are used have identified genomic signals in SCN10A influencing PR duration. The extent to which this can be demonstrated in cohorts derived from electronic medical records is unknown. METHODS AND RESULTS: We performed a genome-wide association study on 2334 European American patients with normal ECGs without evidence of prior heart disease from the Vanderbilt DNA databank, BioVU, which accrues subjects from routine patient care. Subjects were identified by combinations of natural language processing, laboratory queries, and billing code queries of deidentified medical record data. Subjects were 58% female, of mean (± SD) age 54 ± 15 years, and had mean PR intervals of 158 ± 18 ms. Genotyping was performed with the use of the Illumina Human660W-Quad platform. Our results identify 4 single nucleotide polymorphisms (rs6800541, rs6795970, rs6798015, rs7430477) linked to SCN10A associated with PR interval (P=5.73 × 10(-7) to 1.78 × 10(-6)). CONCLUSIONS: This genome-wide association study confirms a gene heretofore not implicated in cardiac pathophysiology as a modulator of PR interval in humans. This study is one of the first replication genome-wide association studies performed with the use of an electronic medical records-derived cohort, supporting their further use for genotype-phenotype analyses.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , Electrocardiografía , Registros Electrónicos de Salud , Corazón/fisiopatología , Polimorfismo de Nucleótido Simple , Canales de Sodio/genética , Adulto , Anciano , Femenino , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Canal de Sodio Activado por Voltaje NAV1.8RESUMEN
MOTIVATION: Emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association scans (PheWAS) for disease-gene associations. We propose a novel method to scan phenomic data for genetic associations using International Classification of Disease (ICD9) billing codes, which are available in most EMR systems. We have developed a code translation table to automatically define 776 different disease populations and their controls using prevalent ICD9 codes derived from EMR data. As a proof of concept of this algorithm, we genotyped the first 6005 European-Americans accrued into BioVU, Vanderbilt's DNA biobank, at five single nucleotide polymorphisms (SNPs) with previously reported disease associations: atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus and rheumatoid arthritis. The PheWAS software generated cases and control populations across all ICD9 code groups for each of these five SNPs, and disease-SNP associations were analyzed. The primary outcome of this study was replication of seven previously known SNP-disease associations for these SNPs. RESULTS: Four of seven known SNP-disease associations using the PheWAS algorithm were replicated with P-values between 2.8 x 10(-6) and 0.011. The PheWAS algorithm also identified 19 previously unknown statistical associations between these SNPs and diseases at P < 0.01. This study indicates that PheWAS analysis is a feasible method to investigate SNP-disease associations. Further evaluation is needed to determine the validity of these associations and the appropriate statistical thresholds for clinical significance. AVAILABILITY: The PheWAS software and code translation table are freely available at http://knowledgemap.mc.vanderbilt.edu/research.
Asunto(s)
Biología Computacional/métodos , Algoritmos , Artritis Reumatoide/genética , Fibrilación Atrial/genética , Estenosis Carotídea/genética , Enfermedad de la Arteria Coronaria/genética , Enfermedad de Crohn/genética , Europa (Continente) , Genotipo , Humanos , Lupus Eritematoso Sistémico/genética , Esclerosis Múltiple/genética , Polimorfismo de Nucleótido Simple , Factores de Riesgo , Programas InformáticosRESUMEN
We describe a two-stage analytical approach for characterizing morbidity profile dissimilarity among patient cohorts using electronic medical records. We capture morbidities using the International Statistical Classification of Diseases and Related Health Problems (ICD-9) codes. In the first stage of the approach separate logistic regression analyses for ICD-9 sections (e.g., "hypertensive disease" or "appendicitis") are conducted, and the odds ratios that describe adjusted differences in prevalence between two cohorts are displayed graphically. In the second stage, the results from ICD-9 section analyses are combined into a general morbidity dissimilarity index (MDI). For illustration, we examine nine cohorts of patients representing six phenotypes (or controls) derived from five institutions, each a participant in the electronic MEdical REcords and GEnomics (eMERGE) network. The phenotypes studied include type II diabetes and type II diabetes controls, peripheral arterial disease and peripheral arterial disease controls, normal cardiac conduction as measured by electrocardiography, and senile cataracts.
Asunto(s)
Registros Electrónicos de Salud , Morbilidad , Estudios de Cohortes , Diabetes Mellitus Tipo 2/epidemiología , Humanos , Clasificación Internacional de Enfermedades , Enfermedad Arterial Periférica/epidemiología , Fenotipo , Prevalencia , Estados UnidosRESUMEN
Resistant hypertension is defined as high blood pressure that remains above treatment goals in spite of the concurrent use of three antihypertensive agents from different classes. Despite the important health consequences of resistant hypertension, few studies of resistant hypertension have been conducted. To perform a genome-wide association study for resistant hypertension, we defined and identified cases of resistant hypertension and hypertensives with treated, controlled hypertension among >47,500 adults residing in the US linked to electronic health records (EHRs) and genotyped as part of the electronic MEdical Records & GEnomics (eMERGE) Network. Electronic selection logic using billing codes, laboratory values, text queries, and medication records was used to identify resistant hypertension cases and controls at each site, and a total of 3,006 cases of resistant hypertension and 876 controlled hypertensives were identified among eMERGE Phase I and II sites. After imputation and quality control, a total of 2,530,150 SNPs were tested for an association among 2,830 multi-ethnic cases of resistant hypertension and 876 controlled hypertensives. No test of association was genome-wide significant in the full dataset or in the dataset limited to European American cases (n = 1,719) and controls (n = 708). The most significant finding was CLNK rs13144136 at p = 1.00x10-6 (odds ratio = 0.68; 95% CI = 0.58-0.80) in the full dataset with similar results in the European American only dataset. We also examined whether SNPs known to influence blood pressure or hypertension also influenced resistant hypertension. None was significant after correction for multiple testing. These data highlight both the difficulties and the potential utility of EHR-linked genomic data to study clinically-relevant traits such as resistant hypertension.
Asunto(s)
Antihipertensivos/uso terapéutico , Resistencia a Medicamentos/genética , Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo , Hipertensión/genética , Adulto , Anciano , Algoritmos , Presión Sanguínea/genética , Estudios de Casos y Controles , Redes de Comunicación de Computadores , Conjuntos de Datos como Asunto , Etnicidad/genética , Genotipo , Humanos , Hipertensión/tratamiento farmacológico , Hipertensión/epidemiología , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Factores de RiesgoRESUMEN
OBJECTIVES: We describe the development, implementation, and evaluation of a model to pre-emptively select patients for genotyping based on medication exposure risk. STUDY DESIGN AND SETTING: Using deidentified electronic health records, we derived a prognostic model for the prescription of statins, warfarin, or clopidogrel. The model was implemented into a clinical decision support (CDS) tool to recommend pre-emptive genotyping for patients exceeding a prescription risk threshold. We evaluated the rule on an independent validation cohort and on an implementation cohort, representing the population in which the CDS tool was deployed. RESULTS: The model exhibited moderate discrimination with area under the receiver operator characteristic curves ranging from 0.68 to 0.75 at 1 and 2 years after index dates. Risk estimates tended to underestimate true risk. The cumulative incidences of medication prescriptions at 1 and 2 years were 0.35 and 0.48, respectively, among 1,673 patients flagged by the model. The cumulative incidences in the same number of randomly sampled subjects were 0.12 and 0.19, and in patients over 50 years with the highest body mass indices, they were 0.22 and 0.34. CONCLUSION: We demonstrate that prognostic algorithms can guide pre-emptive pharmacogenetic testing toward those likely to benefit from it.
Asunto(s)
Utilización de Medicamentos/estadística & datos numéricos , Registros Electrónicos de Salud/organización & administración , Inhibidores de Hidroximetilglutaril-CoA Reductasas/uso terapéutico , Farmacogenética/organización & administración , Ticlopidina/análogos & derivados , Warfarina/uso terapéutico , Adulto , Factores de Edad , Anciano , Clopidogrel , Sistemas de Apoyo a Decisiones Clínicas , Femenino , Humanos , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Modelos Estadísticos , Valor Predictivo de las Pruebas , Pronóstico , Evaluación de Programas y Proyectos de Salud , Modelos de Riesgos Proporcionales , Reproducibilidad de los Resultados , Factores de Riesgo , Factores Sexuales , Ticlopidina/uso terapéutico , Estados UnidosRESUMEN
Type 2 diabetes (T2D) is a complex metabolic disease that disproportionately affects African Americans. Genome-wide association studies (GWAS) have identified several loci that contribute to T2D in European Americans, but few studies have been performed in admixed populations. We first performed a GWAS of 1,563 African Americans from the Vanderbilt Genome-Electronic Records Project and Northwestern University NUgene Project as part of the electronic Medical Records and Genomics (eMERGE) network. We successfully replicate an association in TCF7L2, previously identified by GWAS in this African American dataset. We were unable to identify novel associations at p<5.0×10(-8) by GWAS. Using admixture mapping as an alternative method for discovery, we performed a genome-wide admixture scan that suggests multiple candidate genes associated with T2D. One finding, TCIRG1, is a T-cell immune regulator expressed in the pancreas and liver that has not been previously implicated for T2D. We performed subsequent fine-mapping to further assess the association between TCIRG1 and T2D in >5,000 African Americans. We identified 13 independent associations between TCIRG1, CHKA, and ALDH3B1 genes on chromosome 11 and T2D. Our results suggest a novel region on chromosome 11 identified by admixture mapping is associated with T2D in African Americans.
Asunto(s)
Población Negra/genética , Mapeo Cromosómico/métodos , Cromosomas Humanos Par 11 , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/etnología , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido SimpleRESUMEN
The NAv1.5 sodium channel α subunit is the predominant α-subunit expressed in the heart and is associated with cardiac arrhythmias. We tested five previously identified SCN5A variants (rs7374138, rs7637849, rs7637849, rs7629265, and rs11129796) for an association with PR interval and QRS duration in two unique study populations: the Third National Health and Nutrition Examination Survey (NHANES III, n= 552) accessed by the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) and a combined dataset (n= 455) from two biobanks linked to electronic medical records from Vanderbilt University (BioVU) and Northwestern University (NUgene) as part of the electronic Medical Records & Genomics (eMERGE) network. A meta-analysis including all three study populations (n~4,000) suggests that eight SCN5A associations were significant for both QRS duration and PR interval (p<5.0E-3) with little evidence for heterogeneity across the study populations. These results suggest that published SCN5A associations replicate across different study designs in a meta-analysis and represent an important first step in utility of multiple study designs for genetic studies and the identification/characterization of genetic variants associated with ECG traits in African-descent populations.
RESUMEN
Thyroid stimulating hormone (TSH) hormone levels are normally tightly regulated within an individual; thus, relatively small variations may indicate thyroid disease. Genome-wide association studies (GWAS) have identified variants in PDE8B and FOXE1 that are associated with TSH levels. However, prior studies lacked racial/ethnic diversity, limiting the generalization of these findings to individuals of non-European ethnicities. The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration across institutions with biobanks linked to electronic medical records (EMRs). The eMERGE Network uses EMR-derived phenotypes to perform GWAS in diverse populations for a variety of phenotypes. In this report, we identified serum TSH levels from 4,501 European American and 351 African American euthyroid individuals in the eMERGE Network with existing GWAS data. Tests of association were performed using linear regression and adjusted for age, sex, body mass index (BMI), and principal components, assuming an additive genetic model. Our results replicate the known association of PDE8B with serum TSH levels in European Americans (rs2046045 pâ=â1.85×10-17, ßâ=â0.09). FOXE1 variants, associated with hypothyroidism, were not genome-wide significant (rs10759944: pâ=â1.08×10-6, ßâ=â-0.05). No SNPs reached genome-wide significance in African Americans. However, multiple known associations with TSH levels in European ancestry were nominally significant in African Americans, including PDE8B (rs2046045 pâ=â0.03, ßâ=â-0.09), VEGFA (rs11755845 pâ=â0.01, ßâ=â-0.13), and NFIA (rs334699 pâ=â1.50×10-3, ßâ=â-0.17). We found little evidence that SNPs previously associated with other thyroid-related disorders were associated with serum TSH levels in this study. These results support the previously reported association between PDE8B and serum TSH levels in European Americans and emphasize the need for additional genetic studies in more diverse populations.
Asunto(s)
Negro o Afroamericano/genética , Polimorfismo de Nucleótido Simple , Tirotropina/sangre , Población Blanca/genética , África/etnología , Anciano , Anciano de 80 o más Años , Índice de Masa Corporal , Registros Electrónicos de Salud , Europa (Continente)/etnología , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Enfermedades de la Tiroides/sangre , Enfermedades de la Tiroides/genéticaRESUMEN
BACKGROUND: The ADME Core Panel assays 184 variants across 34 pharmacogenes, many of which are difficult to accurately genotype with standard multiplexing methods. METHODS: We genotyped 326 frequently medicated individuals of European descent in Vanderbilt's biorepository linked to de-identified electronic medical records, BioVU, on the ADME Core Panel to assess quality and performance of the assay. We compared quality control metrics and determined the extent of direct and indirect marker overlap between the ADME Core Panel and the Illumina Omni1-Quad. RESULTS: We found the quality of the ADME Core Panel data to be high, with exceptions in select copy number variants and markers in certain genes (notably CYP2D6). Most of the common variants on the ADME panel are genotyped by the Omni1, but absent rare variants and copy number variants could not be accurately tagged by single markers. CONCLUSION: Our frequently medicated study population did not convincingly differ in allele frequency from reference populations, suggesting that heterogeneous clinical samples (with respect to medications) have similar allele frequency distributions in pharmacogenetics genes compared with reference populations.
Asunto(s)
Registros Electrónicos de Salud , Marcadores Genéticos/genética , Farmacogenética , Polifarmacia , Adulto , Anciano , Anciano de 80 o más Años , Citocromo P-450 CYP2D6/genética , Variaciones en el Número de Copia de ADN , Femenino , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , Población Blanca/genética , Adulto JovenRESUMEN
Genetic association studies have rapidly become a major tool for identifying the genetic basis of common human diseases. The advent of cost-effective genotyping coupled with large collections of samples linked to clinical outcomes and quantitative traits now make it possible to systematically characterize genotype-phenotype relationships in diverse populations and extensive datasets. To capitalize on these advancements, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project, as part of the collaborative Population Architecture using Genomics and Epidemiology (PAGE) study, accesses two collections: the National Health and Nutrition Examination Surveys (NHANES) and BioVU, Vanderbilt University's biorepository linked to de-identified electronic medical records. We describe herein the workflows for accessing and using the epidemiologic (NHANES) and clinical (BioVU) collections, where each workflow has been customized to reflect the content and data access limitations of each respective source. We also describe the process by which these data are generated, standardized, and shared for meta-analysis among the PAGE study sites. As a specific example of the use of BioVU, we describe the data mining efforts to define cases and controls for genetic association studies of common cancers in PAGE. Collectively, the efforts described here are a generalized outline for many of the successful approaches that can be used in the era of high-throughput genotype-phenotype associations for moving biomedical discovery forward to new frontiers of data generation and analysis.
Asunto(s)
Interacción Gen-Ambiente , Estudios de Asociación Genética/estadística & datos numéricos , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Genética de Población/estadística & datos numéricos , Ensayos Analíticos de Alto Rendimiento/estadística & datos numéricos , Humanos , Modelos Lineales , Neoplasias/genética , Encuestas Nutricionales/estadística & datos numéricos , Polimorfismo de Nucleótido Simple , Sistema de Registros/estadística & datos numéricosRESUMEN
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10â»6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
Asunto(s)
Registros Electrónicos de Salud/estadística & datos numéricos , Predisposición Genética a la Enfermedad/epidemiología , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Registro Médico Coordinado/métodos , Polimorfismo de Nucleótido Simple/genética , Mapeo Cromosómico/métodos , Minería de Datos/métodos , Humanos , FenotipoRESUMEN
AIM: Warfarin pharmacogenomic algorithms reduce dosing error, but perform poorly in non-European-Americans. Electronic health record (EHR) systems linked to biobanks may allow for pharmacogenomic analysis, but they have not yet been used for this purpose. PATIENTS & METHODS: We used BioVU, the Vanderbilt EHR-linked DNA repository, to identify European-Americans (n = 1022) and African-Americans (n = 145) on stable warfarin therapy and evaluated the effect of 15 pharmacogenetic variants on stable warfarin dose. RESULTS: Associations between variants in VKORC1, CYP2C9 and CYP4F2 with weekly dose were observed in European-Americans as well as additional variants in CYP2C9 and CALU in African-Americans. Compared with traditional 5 mg/day dosing, implementing the US FDA recommendations or the International Warfarin Pharmacogenomics Consortium (IWPC) algorithm reduced error in weekly dose in European-Americans (13.5-12.4 and 9.5 mg/week, respectively) but less so in African-Americans (15.2-15.0 and 13.8 mg/week, respectively). By further incorporating associated variants specific for European-Americans and African-Americans in an expanded algorithm, dose-prediction error reduced to 9.1 mg/week (95% CI: 8.4-9.6) in European-Americans and 12.4 mg/week (95% CI: 10.0-13.2) in African-Americans. The expanded algorithm explained 41 and 53% of dose variation in African-Americans and European-Americans, respectively, compared with 29 and 50%, respectively, for the IWPC algorithm. Implementing these predictions via dispensable pill regimens similarly reduced dosing error. CONCLUSION: These results validate EHR-linked DNA biorepositories as real-world resources for pharmacogenomic validation and discovery.
Asunto(s)
Anticoagulantes/administración & dosificación , Negro o Afroamericano/genética , Relación Dosis-Respuesta a Droga , Warfarina/administración & dosificación , Población Blanca/genética , Adulto , Anciano , Anciano de 80 o más Años , Hidrocarburo de Aril Hidroxilasas/genética , Proteínas de Unión al Calcio/genética , Citocromo P-450 CYP2C9 , Sistema Enzimático del Citocromo P-450/genética , Familia 4 del Citocromo P450 , Esquema de Medicación , Registros Electrónicos de Salud , Femenino , Humanos , Masculino , Persona de Mediana Edad , Oxigenasas de Función Mixta/genética , Polimorfismo de Nucleótido Simple/genética , Trastornos Relacionados con Sustancias , Vitamina K Epóxido ReductasasRESUMEN
BACKGROUND: Traditional electrocardiographic (ECG) reference ranges were derived from studies in communities or clinical trial populations. The distribution of ECG parameters in a large population presenting to a healthcare system has not been studied. OBJECTIVE: The purpose of this study was to define the contribution of age, race, gender, height, body mass index, and type 2 diabetes mellitus to normal ECG parameters in a population presenting to a healthcare system. METHODS: Study subjects were obtained from the Vanderbilt Synthetic Derivative, a de-identified image of the electronic medical record (EMR), containing more than 20 years of records on 1.7 million subjects. We identified 63,177 unique subjects with an ECG that was read as "normal" by the reviewing cardiologist. Using combinations of natural language processing and laboratory and billing code queries, we identified a subset of 32,949 subjects without cardiovascular disease, interfering medications, or abnormal electrolytes. The ethnic makeup was 77% Caucasian, 13% African American, 1% Hispanic, 1% Asian, and 8% unknown. RESULTS: The range that included 95% of normal PR intervals was 125-196 ms, QRS 69-103 ms, QT interval corrected with Bazett formula 365-458 ms, and heart rate 54-96 bpm. Linear regression modeling of patient characteristic effects reproduced known age and gender effects and identified novel associations with race, body mass index, and type 2 diabetes mellitus. A web-based application for patient-specific normal ranges is available online at http://biostat.mc.vanderbilt.edu/ECGPredictionInterval. CONCLUSION: Analysis of a large set of EMR-derived normal ECGs reproduced known associations, found new relationships, and established patient-specific normal ranges. Such knowledge informs clinical and genetic research and may improve understanding of normal cardiac physiology.