RESUMO
We performed a Phenome-wide association study (PheWAS) utilizing diverse genotypic and phenotypic data existing across multiple populations in the National Health and Nutrition Examination Surveys (NHANES), conducted by the Centers for Disease Control and Prevention (CDC), and accessed by the Epidemiological Architecture for Genes Linked to Environment (EAGLE) study. We calculated comprehensive tests of association in Genetic NHANES using 80 SNPs and 1,008 phenotypes (grouped into 184 phenotype classes), stratified by race-ethnicity. Genetic NHANES includes three surveys (NHANES III, 1999-2000, and 2001-2002) and three race-ethnicities: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We identified 69 PheWAS associations replicating across surveys for the same SNP, phenotype-class, direction of effect, and race-ethnicity at p<0.01, allele frequency >0.01, and sample size >200. Of these 69 PheWAS associations, 39 replicated previously reported SNP-phenotype associations, 9 were related to previously reported associations, and 21 were novel associations. Fourteen results had the same direction of effect across more than one race-ethnicity: one result was novel, 11 replicated previously reported associations, and two were related to previously reported results. Thirteen SNPs showed evidence of pleiotropy. We further explored results with gene-based biological networks, contrasting the direction of effect for pleiotropic associations across phenotypes. One PheWAS result was ABCG2 missense SNP rs2231142, associated with uric acid levels in both non-Hispanic whites and Mexican Americans, protoporphyrin levels in non-Hispanic whites and Mexican Americans, and blood pressure levels in Mexican Americans. Another example was SNP rs1800588 near LIPC, significantly associated with the novel phenotypes of folate levels (Mexican Americans), vitamin E levels (non-Hispanic whites) and triglyceride levels (non-Hispanic whites), and replication for cholesterol levels. The results of this PheWAS show the utility of this approach for exposing more of the complex genetic architecture underlying multiple traits, through generating novel hypotheses for future research.
Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Fenótipo , Adulto , Meio Ambiente , Projetos de Pesquisa Epidemiológica , Etnicidade/genética , Etnicidade/estatística & dados numéricos , Feminino , Frequência do Gene , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Masculino , Pessoa de Meia-Idade , Inquéritos Nutricionais , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Estados Unidos/epidemiologiaRESUMO
Genome-wide association studies (GWASs) primarily performed in European-ancestry (EA) populations have identified numerous loci associated with body mass index (BMI). However, it is still unclear whether these GWAS loci can be generalized to other ethnic groups, such as African Americans (AAs). Furthermore, the putative functional variant or variants in these loci mostly remain under investigation. The overall lower linkage disequilibrium in AA compared to EA populations provides the opportunity to narrow in or fine-map these BMI-related loci. Therefore, we used the Metabochip to densely genotype and evaluate 21 BMI GWAS loci identified in EA studies in 29,151 AAs from the Population Architecture using Genomics and Epidemiology (PAGE) study. Eight of the 21 loci (SEC16B, TMEM18, ETV5, GNPDA2, TFAP2B, BDNF, FTO, and MC4R) were found to be associated with BMI in AAs at 5.8 × 10(-5). Within seven out of these eight loci, we found that, on average, a substantially smaller number of variants was correlated (r(2) > 0.5) with the most significant SNP in AA than in EA populations (16 versus 55). Conditional analyses revealed GNPDA2 harboring a potential additional independent signal. Moreover, Metabochip-wide discovery analyses revealed two BMI-related loci, BRE (rs116612809, p = 3.6 × 10(-8)) and DHX34 (rs4802349, p = 1.2 × 10(-7)), which were significant when adjustment was made for the total number of SNPs tested across the chip. These results demonstrate that fine mapping in AAs is a powerful approach for both narrowing in on the underlying causal variants in known loci and discovering BMI-related loci.
Assuntos
Negro ou Afro-Americano/genética , Índice de Massa Corporal , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Obesidade/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Loci Gênicos , Predisposição Genética para Doença , Genótipo , Humanos , Desequilíbrio de Ligação , Masculino , Pessoa de Meia-Idade , Obesidade/etnologia , Polimorfismo de Nucleotídeo Único , Adulto JovemRESUMO
BACKGROUND/AIMS: Present-day limited resources demand DNA and phenotyping alternatives to the traditional prospective population-based epidemiologic collections. METHODS: To accelerate genomic discovery with an emphasis on diverse populations, we--as part of the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study--accessed all non-European American samples (n = 15,863) available in BioVU, the Vanderbilt University biorepository linked to de-identified electronic medical records, for genomic studies as part of the larger Population Architecture using Genomics and Epidemiology (PAGE) I study. Given previous studies have cautioned against the secondary use of clinically collected data compared with epidemiologically collected data, we present here a characterization of EAGLE BioVU, including the billing and diagnostic (ICD-9) code distributions for adult and pediatric patients as well as comparisons made for select health metrics (body mass index, glucose, HbA1c, HDL-C, LDL-C, and triglycerides) with the population-based National Health and Nutrition Examination Surveys (NHANES) linked to DNA samples (NHANES III, n = 7,159; NHANES 1999-2002, n = 7,839). RESULTS: Overall, the distributions of billing and diagnostic codes suggest this clinical sample is a mixture of healthy and sick patients like that expected for a contemporary American population. CONCLUSION: Little bias is observed among health metrics, suggesting this clinical collection is suitable for genomic studies along with traditional epidemiologic cohorts.
Assuntos
Genômica , Característica Quantitativa Herdável , Adulto , Negro ou Afro-Americano/genética , Criança , Demografia , Feminino , Interação Gene-Ambiente , Hispânico ou Latino/genética , Humanos , Masculino , Inquéritos NutricionaisRESUMO
BACKGROUND: Both environmental and genetic factors impact type 2 diabetes (T2D). To identify such modifiers, we genotyped 15 T2D-associated variants from genome-wide association studies (GWAS) in 6,414 non-Hispanic whites, 3,073 non-Hispanic blacks, and 3,633 Mexican American participants from the National Health and Nutrition Examination Surveys (NHANES) and evaluated interactions between these variants and carbohydrate intake and fiber intake. RESULTS: We calculated a genetic risk score (GRS) with the 15 SNPs. The odds ratio for T2D with each GRS point was 1.10 (95% CI: 1.05-1.14) for non-Hispanic whites, 1.07 (95% CI: 1.02-1.13) for non-Hispanic blacks, and 1.11 (95% CI: 1.06-1.17) for Mexican Americans. We identified two gene-carbohydrate interactions (P < 0.05) in non-Hispanic whites (with CDKAL1 rs471253 and FTO rs8050136), two in non-Hispanic blacks (with IGFBP2 rs4402960 and THADA rs7578597), and two in Mexican Americans (with NOTCH2 rs1092398 and TSPAN8-LGRS rs7961581). We found three gene-fiber interactions in non-Hispanic whites (with ADAMT59 rs4607103, CDKN2A/2B rs1801282, and FTO rs8050136), two in non-Hispanic blacks (with ADAMT59 rs4607103 and THADA rs7578597), and two in Mexican Americans (with THADA rs7578597 and TSPAN8-LGRS rs796158) at the P < 0.05 level. Interactions between the GRS and nutrients failed to reach significance in all the racial/ethnic groups. CONCLUSION: Our results suggest that dietary carbohydrates and fiber may modify T2D-associated variants, highlighting the importance of dietary nutrients in predicting T2D risk.
Assuntos
Diabetes Mellitus Tipo 2/genética , Carboidratos da Dieta , Fibras na Dieta , Interação Gene-Ambiente , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , População Negra/genética , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Americanos Mexicanos/genética , Pessoa de Meia-Idade , Inquéritos Nutricionais , Polimorfismo de Nucleotídeo Único , Fatores de Risco , População Branca/genética , Adulto JovemRESUMO
BACKGROUND: Gallstone disease is one of the most common digestive disorders, affecting more than 30 million Americans. Previous twin studies suggest a heritability of 25% for gallstone formation. To date, one genome-wide association study (GWAS) has been performed in a population of European-descent. Several candidate gene studies have been performed in various populations, but most have been inconclusive. Given that gallstones consist of up to 80% cholesterol, we hypothesized that common genetic variants associated with high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglycerides (TG) would also be associated with gallstone risk. METHODS: To test this hypothesis, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study as part of the Population Architecture using Genomics and Epidemiology (PAGE) study performed tests of association between 49 GWAS-identified lipid trait SNPs and gallstone disease in non-Hispanic whites (446 cases and 1,962 controls), non-Hispanic blacks (179 cases and 1,540 controls), and Mexican Americans (227 cases and 1,478 controls) ascertained for the population-based Third National Health and Nutrition Examination Survey (NHANES III). RESULTS: At a liberal significance threshold of 0.05, five, four, and four SNP(s) were associated with disease risk in non-Hispanic whites, non-Hispanic blacks, and Mexican Americans, respectively. No one SNP was associated with gallstone disease risk in all three racial/ethnic groups. The most significant association was observed for ABCG5 rs6756629 in non-Hispanic whites [odds ratio (OR) = 1.89; 95% confidence interval (CI) = 1.44-2.49; p = 0.0001). ABCG5 rs6756629 is in strong linkage disequilibrium with rs11887534 (D19H), a variant previously associated with gallstone disease risk in populations of European-descent. CONCLUSIONS: We replicated a previously associated variant for gallstone disease risk in non-Hispanic whites. Further discovery and fine-mapping efforts in diverse populations are needed to fully describe the genetic architecture of gallstone disease risk in humans.
Assuntos
HDL-Colesterol/genética , LDL-Colesterol/genética , Cálculos Biliares/genética , Variação Genética , Triglicerídeos/genética , Membro 5 da Subfamília G de Transportadores de Cassetes de Ligação de ATP , Transportadores de Cassetes de Ligação de ATP/genética , Adulto , Negro ou Afro-Americano/genética , Idoso , Estudos de Casos e Controles , Feminino , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Inquéritos Epidemiológicos , Humanos , Desequilíbrio de Ligação , Lipoproteínas/genética , Masculino , Americanos Mexicanos/genética , Pessoa de Meia-Idade , Inquéritos Nutricionais , Polimorfismo de Nucleotídeo Único , Estados Unidos , População Branca/genéticaRESUMO
We describe here the extraction of country-of-origin, an acculturation variable relevant for gene-environment studies, in a biorepository linked to de-identified electronic health records (EHRs) assessed by the Epidemiologic Architecture for Genes Linked to Environment (EAGLE), a study site of the Population Architecture using Genomics and Epidemiology (PAGE) I study. We extracted country-of-origin from the unstructured clinical free text using regular expressions within the MySQL relational database system in a cohort of 15,863 subjects of mostly non-European descent (including 11,519 African Americans, 1,702 Hispanics, and 1,118 Asians). We performed searches for 231 world countries (including independent sovereign states, dependent areas, and disputed territories) and common misspellings in >14 gigabytes of data including >13 billion characters of clinical text. Manual review of a fraction of the initial country-of-origin assignments established rules for data cleaning and quality control to achieve final country-of-origin status for each subject. After data cleaning, a total of 1,911/15,893 (12.02%) subjects were assigned to a country-of-origin outside of the United States. Mexico was the most commonly assigned country outside of the United States (264 subjects; 13.8% of subjects with a foreign country-of-origin assignment). The distribution of the countries assigned followed expectations based on known migration patterns to the United States with an emphasis on the southeastern region. These data suggest country-of-origin can be successfully extracted from unstructured clinical text for downstream genetic association studies.
RESUMO
Body mass index (BMI) is an important outcome and covariate adjustment for many clinical association studies. Accurate assessment of BMI, therefore, is a critical part of many study designs. Electronic health records (EHRs) are a growing source of clinical data for research purposes, and have proven useful for identifying and replicating genetic associations. EHR-based data collected for clinical and billing purposes have several unique properties, including a high degree of heterogeneity or "clinical noise." In this work, we propose a new method for reducing the problems of transcription and recording error for height and weight and apply these methods to a subset of the Vanderbilt University Medical Center biorepository known as EAGLE BioVU (n=15,863). After processing, we show that the distribution of BMI from EAGLE BioVU closely matches population-based estimates from the National Health and Nutrition Examination Surveys (NHANES), and that our approach retains far more data points than traditional outlier detection methods.
RESUMO
BACKGROUND: Biorepositories linked to de-identified electronic medical records (EMRs) have the potential to complement traditional epidemiologic studies in genotype-phenotype studies of complex human diseases and traits. A major challenge in meeting this potential is the use of EMR-derived data to extract phenotypes and covariates for genetic association studies. Unlike traditional epidemiologic data, EMR-derived data are collected for clinical care and are therefore highly variable across patients. The variability of clinical data coupled with the challenges associated with searching unstructured clinical notes requires the development of algorithms to extract phenotypes for analysis. Given the number of possible algorithms that could be developed for any one EMR-derived phenotype, we explored here the impact algorithm decision logic has on genetic association study results for a single quantitative trait, high density lipoprotein cholesterol (HDL-C). RESULTS: We used five different algorithms to extract HDL-C from African American subjects genotyped on the Illumina Metabochip (n = 11,519) as part of Epidemiologic Architecture for Genes Linked to Environment (EAGLE). Tests of association between HDL-C and genetic risk scores for HDL-C associated variants suggest that the genetic effect size does not vary substantially across the five HDL-C definitions. CONCLUSIONS: These data collectively suggest that, at least for this quantitative trait, algorithm decision logic and phenotyping details do not appreciably impact genetic association study test statistics.
RESUMO
BACKGROUND: Racial/ethnic differences for commonly measured clinical variables are well documented, and it has been postulated that population-specific genetic factors may play a role. The genetic heterogeneity of admixed populations, such as African Americans, provides a unique opportunity to identify genomic regions and variants associated with the clinical variability observed for diseases and traits across populations. METHOD: To begin a systematic search for these population-specific genomic regions at the phenome-wide scale, we determined the relationship between global genetic ancestry, specifically European and African ancestry, and clinical variables measured in a population of African Americans from BioVU, Vanderbilt University's biorepository linked to de-identified electronic medical records (EMRs) as part of the Epidemiologic Architecture using Genomics and Epidemiology (EAGLE) study. Through billing (ICD-9) codes, procedure codes, labs, and clinical notes, 36 common clinical and laboratory variables were mined from the EMR, including body mass index (BMI), kidney traits, lipid levels, blood pressure, and electrocardiographic measurements. A total of 15,863 DNA samples from non-European Americans were genotyped on the Illumina Metabochip containing ~200,000 variants, of which 11,166 were from African Americans. Tests of association were performed to examine associations between global ancestry and the phenotype of interest. RESULTS: Increased European ancestry, and conversely decreased African ancestry, was most strongly correlated with an increase in QRS duration, consistent with previous observations that African Americans tend to have shorter a QRS duration compared with European Americans. Despite known racial/ethnic disparities in blood pressure, European and African ancestry was neither associated with diastolic nor systolic blood pressure measurements. CONCLUSION: Collectively, these results suggest that this clinical population can be used to identify traits in which population differences may be due, in part, to population-specific genetics.
RESUMO
BACKGROUND: Mitochondria play a critical role in the cell and have DNA independent of the nuclear genome. There is much evidence that mitochondrial DNA (mtDNA) variation plays a role in human health and disease, however, this area of investigation has lagged behind research into the role of nuclear genetic variation on complex traits and phenotypic outcomes. Phenome-wide association studies (PheWAS) investigate the association between a wide range of traits and genetic variation. To date, this approach has not been used to investigate the relationship between mtDNA variants and phenotypic variation. Herein, we describe the development of a PheWAS framework for mtDNA variants (mt-PheWAS). Using the Metabochip custom genotyping array, nuclear and mitochondrial DNA variants were genotyped in 11,519 African Americans from the Vanderbilt University biorepository, BioVU. We employed both polygenic modeling and association testing with mitochondrial single nucleotide polymorphisms (mtSNPs) to explore the relationship between mtDNA variants and a group of eight cardiovascular-related traits obtained from de-identified electronic medical records within BioVU. RESULTS: Using polygenic modeling we found evidence for an effect of mtDNA variation on total cholesterol and type 2 diabetes (T2D). After performing comprehensive mitochondrial single SNP associations, we identified an increased number of single mtSNP associations with total cholesterol and T2D compared to the other phenotypes examined, which did not have more significantly associated SNPs than would be expected by chance. Among the mtSNPs significantly associated with T2D we identified variant mt16189, an association previously reported only in Asian and European-descent populations. CONCLUSIONS: Our replication of previous findings and identification of novel associations from this initial study suggest that our mt-PheWAS approach is robust for investigating the relationship between mitochondrial genetic variation and a range of phenotypes, providing a framework for future mt-PheWAS.
RESUMO
BACKGROUND: A founder mutation was recently discovered and described as conferring favorable lipid profiles and reduced subclinical atherosclerotic disease in a Pennsylvania Amish population. Preliminary data have suggested that this null mutation APOC3 R19X (rs76353203) is rare in the general population. METHODS AND RESULTS: To better describe the frequency and lipid profile in the general population, we as part of the Population Architecture using Genomics and Epidemiology I Study and the Epidemiological Architecture for Genes Linked to Environment Study genotyped rs76353203 in 1113 Amish participants from Ohio and Indiana and 19 613 participants from the National Health and Nutrition Examination Surveys (NHANES III, 1999 to 2002, and 2007 to 2008). We found no carriers among the Ohio and Indiana Amish. Of the 19 613 NHANES participants, we identified 31 participants carrying the 19X allele, for an overall allele frequency of 0.08%. Among fasting adults, the 19X allele was associated with lower triglycerides (n=7603; ß=-71.20; P=0.007) and higher high-density lipoprotein cholesterol (n=8891; ß=15.65; P=0.0002) and, although not significant, lower low-density lipoprotein cholesterol (n=6502; ß= -4.85; P=0.68) after adjustment for age, sex, and race/ethnicity. On average, 19X allele participants had approximately half the triglyceride levels (geometric means, 51.3 to 69.7 versus 134.6 to 141.3 mg/dL), >20% higher high-density lipoprotein cholesterol levels (geometric means, 56.8 to 74.4 versus 50.38 to 53.36 mg/dL), and lower low-density lipoprotein cholesterol levels (geometric means, 104.5 to 128.6 versus 116.1 to 125.7 mg/dL) compared with noncarrier participants. CONCLUSIONS: These data demonstrate that APOC3 19X exists in the general US population in multiple racial/ethnic groups and is associated with cardio-protective lipid profiles.
Assuntos
Apolipoproteína C-III/genética , Adulto , Idoso , Alelos , Amish/genética , Aterosclerose/genética , Aterosclerose/patologia , HDL-Colesterol/sangue , LDL-Colesterol/sangue , Feminino , Frequência do Gene , Genótipo , Haplótipos , Humanos , Masculino , Pessoa de Meia-Idade , Inquéritos Nutricionais , Polimorfismo de Nucleotídeo Único , Triglicerídeos/sangueRESUMO
Genetic association studies have rapidly become a major tool for identifying the genetic basis of common human diseases. The advent of cost-effective genotyping coupled with large collections of samples linked to clinical outcomes and quantitative traits now make it possible to systematically characterize genotype-phenotype relationships in diverse populations and extensive datasets. To capitalize on these advancements, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project, as part of the collaborative Population Architecture using Genomics and Epidemiology (PAGE) study, accesses two collections: the National Health and Nutrition Examination Surveys (NHANES) and BioVU, Vanderbilt University's biorepository linked to de-identified electronic medical records. We describe herein the workflows for accessing and using the epidemiologic (NHANES) and clinical (BioVU) collections, where each workflow has been customized to reflect the content and data access limitations of each respective source. We also describe the process by which these data are generated, standardized, and shared for meta-analysis among the PAGE study sites. As a specific example of the use of BioVU, we describe the data mining efforts to define cases and controls for genetic association studies of common cancers in PAGE. Collectively, the efforts described here are a generalized outline for many of the successful approaches that can be used in the era of high-throughput genotype-phenotype associations for moving biomedical discovery forward to new frontiers of data generation and analysis.
Assuntos
Interação Gene-Ambiente , Estudos de Associação Genética/estatística & dados numéricos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Genética Populacional/estatística & dados numéricos , Ensaios de Triagem em Larga Escala/estatística & dados numéricos , Humanos , Modelos Lineares , Neoplasias/genética , Inquéritos Nutricionais/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Sistema de Registros/estatística & dados numéricosRESUMO
BACKGROUND: A number of genetic variants have been discovered by recent genome-wide association studies for their associations with clinical coronary heart disease (CHD). However, it is unclear whether these variants are also associated with the development of CHD as measured by subclinical atherosclerosis phenotypes, ankle brachial index (ABI), carotid artery intima-media thickness (cIMT) and carotid plaque. METHODS: Ten CHD risk single nucleotide polymorphisms (SNPs) were genotyped in individuals of European American (EA), African American (AA), American Indian (AI), and Mexican American (MA) ancestry in the Population Architecture using Genomics and Epidemiology (PAGE) study. In each individual study, we performed linear or logistic regression to examine population-specific associations between SNPs and ABI, common and internal cIMT, and plaque. The results from individual studies were meta-analyzed using a fixed effect inverse variance weighted model. RESULTS: None of the ten SNPs was significantly associated with ABI and common or internal cIMT, after Bonferroni correction. In the sample of 13,337 EA, 3809 AA, and 5353 AI individuals with carotid plaque measurement, the GCKR SNP rs780094 was significantly associated with the presence of plaque in AI only (OR = 1.32, 95% confidence interval: 1.17, 1.49, P = 1.08 × 10(-5)), but not in the other populations (P = 0.90 in EA and P = 0.99 in AA). A 9p21 region SNP, rs1333049, was nominally associated with plaque in EA (OR = 1.07, P = 0.02) and in AI (OR = 1.10, P = 0.05). CONCLUSIONS: We identified a significant association between rs780094 and plaque in AI populations, which needs to be replicated in future studies. There was little evidence that the index CHD risk variants identified through genome-wide association studies in EA influence the development of CHD through subclinical atherosclerosis as assessed by cIMT and ABI across ancestries.