Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Nat Commun ; 14(1): 5419, 2023 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-37669985

RESUMO

Recently, large scale genomic projects such as All of Us and the UK Biobank have introduced a new research paradigm where data are stored centrally in cloud-based Trusted Research Environments (TREs). To characterize the advantages and drawbacks of different TRE attributes in facilitating cross-cohort analysis, we conduct a Genome-Wide Association Study of standard lipid measures using two approaches: meta-analysis and pooled analysis. Comparison of full summary data from both approaches with an external study shows strong correlation of known loci with lipid levels (R2 ~ 83-97%). Importantly, 90 variants meet the significance threshold only in the meta-analysis and 64 variants are significant only in pooled analysis, with approximately 20% of variants in each of those groups being most prevalent in non-European, non-Asian ancestry individuals. These findings have important implications, as technical and policy choices lead to cross-cohort analyses generating similar, but not identical results, particularly for non-European ancestral populations.


Assuntos
Estudo de Associação Genômica Ampla , Saúde da População , Humanos , Genômica , Políticas , Lipídeos
2.
Annu Rev Biomed Data Sci ; 6: 443-464, 2023 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-37561600

RESUMO

The All of Us Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in All of Us, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the All of Us data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.


Assuntos
Pesquisa Biomédica , Saúde da População , Humanos , Ecossistema , Medicina de Precisão
3.
PLoS One ; 12(2): e0171745, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28222112

RESUMO

Resistant hypertension is defined as high blood pressure that remains above treatment goals in spite of the concurrent use of three antihypertensive agents from different classes. Despite the important health consequences of resistant hypertension, few studies of resistant hypertension have been conducted. To perform a genome-wide association study for resistant hypertension, we defined and identified cases of resistant hypertension and hypertensives with treated, controlled hypertension among >47,500 adults residing in the US linked to electronic health records (EHRs) and genotyped as part of the electronic MEdical Records & GEnomics (eMERGE) Network. Electronic selection logic using billing codes, laboratory values, text queries, and medication records was used to identify resistant hypertension cases and controls at each site, and a total of 3,006 cases of resistant hypertension and 876 controlled hypertensives were identified among eMERGE Phase I and II sites. After imputation and quality control, a total of 2,530,150 SNPs were tested for an association among 2,830 multi-ethnic cases of resistant hypertension and 876 controlled hypertensives. No test of association was genome-wide significant in the full dataset or in the dataset limited to European American cases (n = 1,719) and controls (n = 708). The most significant finding was CLNK rs13144136 at p = 1.00x10-6 (odds ratio = 0.68; 95% CI = 0.58-0.80) in the full dataset with similar results in the European American only dataset. We also examined whether SNPs known to influence blood pressure or hypertension also influenced resistant hypertension. None was significant after correction for multiple testing. These data highlight both the difficulties and the potential utility of EHR-linked genomic data to study clinically-relevant traits such as resistant hypertension.


Assuntos
Anti-Hipertensivos/uso terapêutico , Resistência a Medicamentos/genética , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Hipertensão/genética , Adulto , Idoso , Algoritmos , Pressão Sanguínea/genética , Estudos de Casos e Controles , Redes de Comunicação de Computadores , Conjuntos de Dados como Assunto , Etnicidade/genética , Genótipo , Humanos , Hipertensão/tratamento farmacológico , Hipertensão/epidemiologia , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Fatores de Risco
4.
J Clin Epidemiol ; 72: 107-15, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26628336

RESUMO

OBJECTIVES: We describe the development, implementation, and evaluation of a model to pre-emptively select patients for genotyping based on medication exposure risk. STUDY DESIGN AND SETTING: Using deidentified electronic health records, we derived a prognostic model for the prescription of statins, warfarin, or clopidogrel. The model was implemented into a clinical decision support (CDS) tool to recommend pre-emptive genotyping for patients exceeding a prescription risk threshold. We evaluated the rule on an independent validation cohort and on an implementation cohort, representing the population in which the CDS tool was deployed. RESULTS: The model exhibited moderate discrimination with area under the receiver operator characteristic curves ranging from 0.68 to 0.75 at 1 and 2 years after index dates. Risk estimates tended to underestimate true risk. The cumulative incidences of medication prescriptions at 1 and 2 years were 0.35 and 0.48, respectively, among 1,673 patients flagged by the model. The cumulative incidences in the same number of randomly sampled subjects were 0.12 and 0.19, and in patients over 50 years with the highest body mass indices, they were 0.22 and 0.34. CONCLUSION: We demonstrate that prognostic algorithms can guide pre-emptive pharmacogenetic testing toward those likely to benefit from it.


Assuntos
Uso de Medicamentos/estatística & dados numéricos , Registros Eletrônicos de Saúde/organização & administração , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Farmacogenética/organização & administração , Ticlopidina/análogos & derivados , Varfarina/uso terapêutico , Adulto , Fatores Etários , Idoso , Clopidogrel , Sistemas de Apoio a Decisões Clínicas , Feminino , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Valor Preditivo dos Testes , Prognóstico , Avaliação de Programas e Projetos de Saúde , Modelos de Riscos Proporcionais , Reprodutibilidade dos Testes , Fatores de Risco , Fatores Sexuais , Ticlopidina/uso terapêutico , Estados Unidos
5.
PLoS One ; 9(12): e111301, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25436638

RESUMO

Thyroid stimulating hormone (TSH) hormone levels are normally tightly regulated within an individual; thus, relatively small variations may indicate thyroid disease. Genome-wide association studies (GWAS) have identified variants in PDE8B and FOXE1 that are associated with TSH levels. However, prior studies lacked racial/ethnic diversity, limiting the generalization of these findings to individuals of non-European ethnicities. The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration across institutions with biobanks linked to electronic medical records (EMRs). The eMERGE Network uses EMR-derived phenotypes to perform GWAS in diverse populations for a variety of phenotypes. In this report, we identified serum TSH levels from 4,501 European American and 351 African American euthyroid individuals in the eMERGE Network with existing GWAS data. Tests of association were performed using linear regression and adjusted for age, sex, body mass index (BMI), and principal components, assuming an additive genetic model. Our results replicate the known association of PDE8B with serum TSH levels in European Americans (rs2046045 p = 1.85×10-17, ß = 0.09). FOXE1 variants, associated with hypothyroidism, were not genome-wide significant (rs10759944: p = 1.08×10-6, ß = -0.05). No SNPs reached genome-wide significance in African Americans. However, multiple known associations with TSH levels in European ancestry were nominally significant in African Americans, including PDE8B (rs2046045 p = 0.03, ß = -0.09), VEGFA (rs11755845 p = 0.01, ß = -0.13), and NFIA (rs334699 p = 1.50×10-3, ß = -0.17). We found little evidence that SNPs previously associated with other thyroid-related disorders were associated with serum TSH levels in this study. These results support the previously reported association between PDE8B and serum TSH levels in European Americans and emphasize the need for additional genetic studies in more diverse populations.


Assuntos
Negro ou Afro-Americano/genética , Polimorfismo de Nucleotídeo Único , Tireotropina/sangue , População Branca/genética , África/etnologia , Idoso , Idoso de 80 Anos ou mais , Índice de Massa Corporal , Registros Eletrônicos de Saúde , Europa (Continente)/etnologia , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Doenças da Glândula Tireoide/sangue , Doenças da Glândula Tireoide/genética
6.
PLoS One ; 9(3): e86931, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24595071

RESUMO

Type 2 diabetes (T2D) is a complex metabolic disease that disproportionately affects African Americans. Genome-wide association studies (GWAS) have identified several loci that contribute to T2D in European Americans, but few studies have been performed in admixed populations. We first performed a GWAS of 1,563 African Americans from the Vanderbilt Genome-Electronic Records Project and Northwestern University NUgene Project as part of the electronic Medical Records and Genomics (eMERGE) network. We successfully replicate an association in TCF7L2, previously identified by GWAS in this African American dataset. We were unable to identify novel associations at p<5.0×10(-8) by GWAS. Using admixture mapping as an alternative method for discovery, we performed a genome-wide admixture scan that suggests multiple candidate genes associated with T2D. One finding, TCIRG1, is a T-cell immune regulator expressed in the pancreas and liver that has not been previously implicated for T2D. We performed subsequent fine-mapping to further assess the association between TCIRG1 and T2D in >5,000 African Americans. We identified 13 independent associations between TCIRG1, CHKA, and ALDH3B1 genes on chromosome 11 and T2D. Our results suggest a novel region on chromosome 11 identified by admixture mapping is associated with T2D in African Americans.


Assuntos
População Negra/genética , Mapeamento Cromossômico/métodos , Cromossomos Humanos Par 11 , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/etnologia , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único
7.
Artigo em Inglês | MEDLINE | ID: mdl-25590050

RESUMO

The NAv1.5 sodium channel α subunit is the predominant α-subunit expressed in the heart and is associated with cardiac arrhythmias. We tested five previously identified SCN5A variants (rs7374138, rs7637849, rs7637849, rs7629265, and rs11129796) for an association with PR interval and QRS duration in two unique study populations: the Third National Health and Nutrition Examination Survey (NHANES III, n= 552) accessed by the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) and a combined dataset (n= 455) from two biobanks linked to electronic medical records from Vanderbilt University (BioVU) and Northwestern University (NUgene) as part of the electronic Medical Records & Genomics (eMERGE) network. A meta-analysis including all three study populations (n~4,000) suggests that eight SCN5A associations were significant for both QRS duration and PR interval (p<5.0E-3) with little evidence for heterogeneity across the study populations. These results suggest that published SCN5A associations replicate across different study designs in a meta-analysis and represent an important first step in utility of multiple study designs for genetic studies and the identification/characterization of genetic variants associated with ECG traits in African-descent populations.

8.
Nat Biotechnol ; 31(12): 1102-10, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24270849

RESUMO

Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10⁻6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.


Assuntos
Registros Eletrônicos de Saúde/estatística & dados numéricos , Predisposição Genética para Doença/epidemiologia , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Registro Médico Coordenado/métodos , Polimorfismo de Nucleotídeo Único/genética , Mapeamento Cromossômico/métodos , Mineração de Dados/métodos , Humanos , Fenótipo
9.
Genet Med ; 15(10): 761-71, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23743551

RESUMO

The Electronic Medical Records and Genomics Network is a National Human Genome Research Institute-funded consortium engaged in the development of methods and best practices for using the electronic medical record as a tool for genomic research. Now in its sixth year and second funding cycle, and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from electronic medical records can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and health-care informatics, particularly for electronic phenotyping, genome-wide association studies, genomic medicine implementation, and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here, we describe the evolution, accomplishments, opportunities, and challenges of the network from its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting toward the implementation of genomic medicine.


Assuntos
Registros Eletrônicos de Saúde , Pesquisa em Genética , Genômica , Registros Eletrônicos de Saúde/tendências , Pesquisa em Genética/ética , Estudo de Associação Genômica Ampla , Genômica/ética , Genômica/tendências , Genótipo , Humanos , National Human Genome Research Institute (U.S.) , Fenótipo , Medicina de Precisão , Estados Unidos
10.
Pharmacogenomics ; 14(7): 735-44, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23651022

RESUMO

BACKGROUND: The ADME Core Panel assays 184 variants across 34 pharmacogenes, many of which are difficult to accurately genotype with standard multiplexing methods. METHODS: We genotyped 326 frequently medicated individuals of European descent in Vanderbilt's biorepository linked to de-identified electronic medical records, BioVU, on the ADME Core Panel to assess quality and performance of the assay. We compared quality control metrics and determined the extent of direct and indirect marker overlap between the ADME Core Panel and the Illumina Omni1-Quad. RESULTS: We found the quality of the ADME Core Panel data to be high, with exceptions in select copy number variants and markers in certain genes (notably CYP2D6). Most of the common variants on the ADME panel are genotyped by the Omni1, but absent rare variants and copy number variants could not be accurately tagged by single markers. CONCLUSION: Our frequently medicated study population did not convincingly differ in allele frequency from reference populations, suggesting that heterogeneous clinical samples (with respect to medications) have similar allele frequency distributions in pharmacogenetics genes compared with reference populations.


Assuntos
Registros Eletrônicos de Saúde , Marcadores Genéticos/genética , Farmacogenética , Polimedicação , Adulto , Idoso , Idoso de 80 Anos ou mais , Citocromo P-450 CYP2D6/genética , Variações do Número de Cópias de DNA , Feminino , Frequência do Gene , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , População Branca/genética , Adulto Jovem
11.
Circulation ; 127(13): 1377-85, 2013 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-23463857

RESUMO

BACKGROUND: ECG QRS duration, a measure of cardiac intraventricular conduction, varies ≈2-fold in individuals without cardiac disease. Slow conduction may promote re-entrant arrhythmias. METHODS AND RESULTS: We performed a genome-wide association study to identify genomic markers of QRS duration in 5272 individuals without cardiac disease selected from electronic medical record algorithms at 5 sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium QRS genome-wide association study meta-analysis. Twenty-three single-nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 single-nucleotide polymorphisms were in the chromosome 3 SCN5A and SCN10A loci, where the most significant single-nucleotide polymorphisms were rs1805126 in SCN5A with P=1.2×10(-8) (eMERGE) and P=2.5×10(-20) (CHARGE) and rs6795970 in SCN10A with P=6×10(-6) (eMERGE) and P=5×10(-27) (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies on variants in these 5 loci in 13859 European Americans to search for diagnoses associated with these markers. Phenome-wide association study identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5272 "heart-healthy" study population. CONCLUSIONS: We conclude that DNA biobanks coupled to electronic medical records not only provide a platform for genome-wide association study but also may allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The phenome-wide association study approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.


Assuntos
Arritmias Cardíacas/diagnóstico , Arritmias Cardíacas/genética , Marcadores Genéticos/genética , Estudo de Associação Genômica Ampla/métodos , Sistema de Condução Cardíaco/fisiopatologia , Frequência Cardíaca/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Arritmias Cardíacas/epidemiologia , Feminino , Sistema de Condução Cardíaco/metabolismo , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco
12.
Ann Hum Genet ; 77(4): 321-32, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23534349

RESUMO

Electrocardiographic (ECG) measurements vary by ancestry. Genome-wide association studies (GWAS) have identified loci that contribute to ECG measurements; however, most are performed in Europeans collected from population-based cohorts or surveys. The strongest associations reported are in NOS1AP with QT interval and SCN10A with PR and QRS durations. The extent to which these associations can be generalized to African Americans has yet to be determined. Using electronic medical records, PR and QT intervals, QRS duration, and heart rate were determined in 455 African Americans as part of the Vanderbilt Genome-Electronic Records Project and Northwestern University NUgene Project. We tested for an association between these ECG traits and >930K SNPs. We identified a total 36 novel associations with PR interval, QRS duration, QT interval, and heart rate at p < 1.0 × 10(-6). Using published GWAS data, we compared our results with those previously identified in other populations. Five associations originally identified in other populations generalized with respect to statistical significance and direction of effect. A total of 43 associations have a consistent direction of effect with European and/or Asian populations. This work provides a catalogue of generalized versus nongeneralized associations, a necessary step in prioritizing GWAS-identified regions for further fine-mapping in diverse populations.


Assuntos
Negro ou Afro-Americano/genética , Eletrocardiografia , Variação Genética , Estudo de Associação Genômica Ampla , Característica Quantitativa Herdável , Adulto , Alelos , Mapeamento Cromossômico , Etnicidade/genética , Feminino , Frequência do Gene , Estudos de Associação Genética , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , População Branca/genética
13.
Pac Symp Biocomput ; : 373-84, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23424142

RESUMO

Genetic association studies have rapidly become a major tool for identifying the genetic basis of common human diseases. The advent of cost-effective genotyping coupled with large collections of samples linked to clinical outcomes and quantitative traits now make it possible to systematically characterize genotype-phenotype relationships in diverse populations and extensive datasets. To capitalize on these advancements, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project, as part of the collaborative Population Architecture using Genomics and Epidemiology (PAGE) study, accesses two collections: the National Health and Nutrition Examination Surveys (NHANES) and BioVU, Vanderbilt University's biorepository linked to de-identified electronic medical records. We describe herein the workflows for accessing and using the epidemiologic (NHANES) and clinical (BioVU) collections, where each workflow has been customized to reflect the content and data access limitations of each respective source. We also describe the process by which these data are generated, standardized, and shared for meta-analysis among the PAGE study sites. As a specific example of the use of BioVU, we describe the data mining efforts to define cases and controls for genetic association studies of common cancers in PAGE. Collectively, the efforts described here are a generalized outline for many of the successful approaches that can be used in the era of high-throughput genotype-phenotype associations for moving biomedical discovery forward to new frontiers of data generation and analysis.


Assuntos
Interação Gene-Ambiente , Estudos de Associação Genética/estatística & dados numéricos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Genética Populacional/estatística & dados numéricos , Ensaios de Triagem em Larga Escala/estatística & dados numéricos , Humanos , Modelos Lineares , Neoplasias/genética , Inquéritos Nutricionais/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Sistema de Registros/estatística & dados numéricos
14.
Pharmacogenomics ; 13(4): 407-18, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22329724

RESUMO

AIM: Warfarin pharmacogenomic algorithms reduce dosing error, but perform poorly in non-European-Americans. Electronic health record (EHR) systems linked to biobanks may allow for pharmacogenomic analysis, but they have not yet been used for this purpose. PATIENTS & METHODS: We used BioVU, the Vanderbilt EHR-linked DNA repository, to identify European-Americans (n = 1022) and African-Americans (n = 145) on stable warfarin therapy and evaluated the effect of 15 pharmacogenetic variants on stable warfarin dose. RESULTS: Associations between variants in VKORC1, CYP2C9 and CYP4F2 with weekly dose were observed in European-Americans as well as additional variants in CYP2C9 and CALU in African-Americans. Compared with traditional 5 mg/day dosing, implementing the US FDA recommendations or the International Warfarin Pharmacogenomics Consortium (IWPC) algorithm reduced error in weekly dose in European-Americans (13.5-12.4 and 9.5 mg/week, respectively) but less so in African-Americans (15.2-15.0 and 13.8 mg/week, respectively). By further incorporating associated variants specific for European-Americans and African-Americans in an expanded algorithm, dose-prediction error reduced to 9.1 mg/week (95% CI: 8.4-9.6) in European-Americans and 12.4 mg/week (95% CI: 10.0-13.2) in African-Americans. The expanded algorithm explained 41 and 53% of dose variation in African-Americans and European-Americans, respectively, compared with 29 and 50%, respectively, for the IWPC algorithm. Implementing these predictions via dispensable pill regimens similarly reduced dosing error. CONCLUSION: These results validate EHR-linked DNA biorepositories as real-world resources for pharmacogenomic validation and discovery.


Assuntos
Anticoagulantes/administração & dosagem , Negro ou Afro-Americano/genética , Relação Dose-Resposta a Droga , Varfarina/administração & dosagem , População Branca/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Hidrocarboneto de Aril Hidroxilases/genética , Proteínas de Ligação ao Cálcio/genética , Citocromo P-450 CYP2C9 , Sistema Enzimático do Citocromo P-450/genética , Família 4 do Citocromo P450 , Esquema de Medicação , Registros Eletrônicos de Saúde , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Oxigenases de Função Mista/genética , Polimorfismo de Nucleotídeo Único/genética , Transtornos Relacionados ao Uso de Substâncias , Vitamina K Epóxido Redutases
15.
Am J Hum Genet ; 89(4): 529-42, 2011 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-21981779

RESUMO

We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10(-9)). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10(-6)). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10(-5)), nodular (OR = 0.76, p = 3.1 × 10(-5)) and multinodular (OR = 0.69, p = 3.9 × 10(-5)) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10(-3)), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10(-13)), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.


Assuntos
Fatores de Transcrição Forkhead/genética , Hipotireoidismo/genética , Idoso , Algoritmos , Feminino , Marcadores Genéticos , Variação Genética , Genoma , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Sistemas Computadorizados de Registros Médicos , Pessoa de Meia-Idade , Fenótipo , Valor Preditivo dos Testes
16.
J Am Med Inform Assoc ; 18(4): 387-91, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21672908

RESUMO

OBJECTIVE: DNA biobanks linked to comprehensive electronic health records systems are potentially powerful resources for pharmacogenetic studies. This study sought to develop natural-language-processing algorithms to extract drug-dose information from clinical text, and to assess the capabilities of such tools to automate the data-extraction process for pharmacogenetic studies. MATERIALS AND METHODS: A manually validated warfarin pharmacogenetic study identified a cohort of 1125 patients with a stable warfarin dose, in which 776 patients were managed by Coumadin Clinic physicians, and the remaining 349 patients were managed by their providers. The authors developed two algorithms to extract weekly warfarin doses from both data sets: a regular expression-based program for semistructured Coumadin Clinic notes; and an advanced weekly dose calculator based on an existing medication information extraction system (MedEx) for narrative providers' notes. The authors then conducted an association analysis between an automatically extracted stable weekly dose of warfarin and four genetic variants of VKORC1 and CYP2C9 genes. The performance of the weekly dose-extraction program was evaluated by comparing it with a gold standard containing manually curated weekly doses. Precision, recall, F-measure, and overall accuracy were reported. Associations between known variants in VKORC1 and CYP2C9 and warfarin stable weekly dose were performed with linear regression adjusted for age, gender, and body mass index. RESULTS: The authors' evaluation showed that the MedEx-based system could determine patients' warfarin weekly doses with 99.7% recall, 90.8% precision, and 93.8% accuracy. Using the automatically extracted weekly doses of warfarin, the authors successfully replicated the previous known associations between warfarin stable dose and genetic variants in VKORC1 and CYP2C9.


Assuntos
Anticoagulantes/administração & dosagem , Mineração de Dados/métodos , Bases de Dados de Ácidos Nucleicos , Cálculos da Dosagem de Medicamento , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Varfarina/administração & dosagem , Algoritmos , Hidrocarboneto de Aril Hidroxilases/genética , Citocromo P-450 CYP2C9 , Estudo de Associação Genômica Ampla , Humanos , Modelos Lineares , Oxigenases de Função Mista/genética , Farmacogenética , Medicina de Precisão , Estados Unidos , Vitamina K Epóxido Redutases
17.
Heart Rhythm ; 8(2): 271-7, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21044898

RESUMO

BACKGROUND: Traditional electrocardiographic (ECG) reference ranges were derived from studies in communities or clinical trial populations. The distribution of ECG parameters in a large population presenting to a healthcare system has not been studied. OBJECTIVE: The purpose of this study was to define the contribution of age, race, gender, height, body mass index, and type 2 diabetes mellitus to normal ECG parameters in a population presenting to a healthcare system. METHODS: Study subjects were obtained from the Vanderbilt Synthetic Derivative, a de-identified image of the electronic medical record (EMR), containing more than 20 years of records on 1.7 million subjects. We identified 63,177 unique subjects with an ECG that was read as "normal" by the reviewing cardiologist. Using combinations of natural language processing and laboratory and billing code queries, we identified a subset of 32,949 subjects without cardiovascular disease, interfering medications, or abnormal electrolytes. The ethnic makeup was 77% Caucasian, 13% African American, 1% Hispanic, 1% Asian, and 8% unknown. RESULTS: The range that included 95% of normal PR intervals was 125-196 ms, QRS 69-103 ms, QT interval corrected with Bazett formula 365-458 ms, and heart rate 54-96 bpm. Linear regression modeling of patient characteristic effects reproduced known age and gender effects and identified novel associations with race, body mass index, and type 2 diabetes mellitus. A web-based application for patient-specific normal ranges is available online at http://biostat.mc.vanderbilt.edu/ECGPredictionInterval. CONCLUSION: Analysis of a large set of EMR-derived normal ECGs reproduced known associations, found new relationships, and established patient-specific normal ranges. Such knowledge informs clinical and genetic research and may improve understanding of normal cardiac physiology.


Assuntos
Eletrocardiografia , Registros Eletrônicos de Saúde , Frequência Cardíaca/fisiologia , Adulto , Fatores Etários , Estatura , Índice de Massa Corporal , Bases de Dados Factuais , Diabetes Mellitus Tipo 2 , Feminino , Sistema de Condução Cardíaco/fisiologia , Humanos , Masculino , Pessoa de Meia-Idade , Valores de Referência , Estudos Retrospectivos , Fatores Sexuais , Fatores de Tempo
18.
Circulation ; 122(20): 2016-21, 2010 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-21041692

RESUMO

BACKGROUND: Recent genome-wide association studies in which selected community populations are used have identified genomic signals in SCN10A influencing PR duration. The extent to which this can be demonstrated in cohorts derived from electronic medical records is unknown. METHODS AND RESULTS: We performed a genome-wide association study on 2334 European American patients with normal ECGs without evidence of prior heart disease from the Vanderbilt DNA databank, BioVU, which accrues subjects from routine patient care. Subjects were identified by combinations of natural language processing, laboratory queries, and billing code queries of deidentified medical record data. Subjects were 58% female, of mean (± SD) age 54 ± 15 years, and had mean PR intervals of 158 ± 18 ms. Genotyping was performed with the use of the Illumina Human660W-Quad platform. Our results identify 4 single nucleotide polymorphisms (rs6800541, rs6795970, rs6798015, rs7430477) linked to SCN10A associated with PR interval (P=5.73 × 10(-7) to 1.78 × 10(-6)). CONCLUSIONS: This genome-wide association study confirms a gene heretofore not implicated in cardiac pathophysiology as a modulator of PR interval in humans. This study is one of the first replication genome-wide association studies performed with the use of an electronic medical records-derived cohort, supporting their further use for genotype-phenotype analyses.


Assuntos
Bases de Dados de Ácidos Nucleicos , Eletrocardiografia , Registros Eletrônicos de Saúde , Coração/fisiopatologia , Polimorfismo de Nucleotídeo Único , Canais de Sódio/genética , Adulto , Idoso , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Canal de Sódio Disparado por Voltagem NAV1.8
19.
J Biomed Inform ; 43(6): 914-23, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20688191

RESUMO

We describe a two-stage analytical approach for characterizing morbidity profile dissimilarity among patient cohorts using electronic medical records. We capture morbidities using the International Statistical Classification of Diseases and Related Health Problems (ICD-9) codes. In the first stage of the approach separate logistic regression analyses for ICD-9 sections (e.g., "hypertensive disease" or "appendicitis") are conducted, and the odds ratios that describe adjusted differences in prevalence between two cohorts are displayed graphically. In the second stage, the results from ICD-9 section analyses are combined into a general morbidity dissimilarity index (MDI). For illustration, we examine nine cohorts of patients representing six phenotypes (or controls) derived from five institutions, each a participant in the electronic MEdical REcords and GEnomics (eMERGE) network. The phenotypes studied include type II diabetes and type II diabetes controls, peripheral arterial disease and peripheral arterial disease controls, normal cardiac conduction as measured by electrocardiography, and senile cataracts.


Assuntos
Registros Eletrônicos de Saúde , Morbidade , Estudos de Coortes , Diabetes Mellitus Tipo 2/epidemiologia , Humanos , Classificação Internacional de Doenças , Doença Arterial Periférica/epidemiologia , Fenótipo , Prevalência , Estados Unidos
20.
Am J Hum Genet ; 86(4): 560-72, 2010 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-20362271

RESUMO

Large-scale DNA databanks linked to electronic medical record (EMR) systems have been proposed as an approach for rapidly generating large, diverse cohorts for discovery and replication of genotype-phenotype associations. However, the extent to which such resources are capable of delivering on this promise is unknown. We studied whether an EMR-linked DNA biorepository can be used to detect known genotype-phenotype associations for five diseases. Twenty-one SNPs previously implicated as common variants predisposing to atrial fibrillation, Crohn disease, multiple sclerosis, rheumatoid arthritis, or type 2 diabetes were successfully genotyped in 9483 samples accrued over 4 mo into BioVU, the Vanderbilt University Medical Center DNA biobank. Previously reported odds ratios (OR(PR)) ranged from 1.14 to 2.36. For each phenotype, natural language processing techniques and billing-code queries were used to identify cases (n = 70-698) and controls (n = 808-3818) from deidentified health records. Each of the 21 tests of association yielded point estimates in the expected direction. Previous genotype-phenotype associations were replicated (p < 0.05) in 8/14 cases when the OR(PR) was > 1.25, and in 0/7 with lower OR(PR). Statistically significant associations were detected in all analyses that were adequately powered. In each of the five diseases studied, at least one previously reported association was replicated. These data demonstrate that phenotypes representing clinical diagnoses can be extracted from EMR systems, and they support the use of DNA resources coupled to EMR systems as tools for rapid generation of large data sets required for replication of associations found in research cohorts and for discovery in genome science.


Assuntos
Artrite Reumatoide/genética , Fibrilação Atrial/genética , Doença de Crohn/genética , Diabetes Mellitus Tipo 2/genética , Registros Eletrônicos de Saúde , Estudos de Associação Genética/tendências , Esclerose Múltipla/genética , Estudos de Casos e Controles , DNA/sangue , DNA/genética , Genoma Humano , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA