RESUMO
Uterine fibroids (UF) are common pelvic tumors in women, heritable, and genome-wide association studies (GWAS) have identified ~ 30 loci associated with increased risk in UF. Using summary statistics from a previously published UF GWAS performed in a non-Hispanic European Ancestry (NHW) female subset from the Electronic Medical Records and Genomics (eMERGE) Network, we constructed a polygenic risk score (PRS) for UF. UF-PRS was developed using PRSice and optimized in the separate clinical population of BioVU. PRS was validated using parallel methods of 10-fold cross-validation logistic regression and phenome-wide association study (PheWAS) in a seperate subset of eMERGE NHW females (validation set), excluding samples used in GWAS. PRSice determined pt < 0.001 and after linkage disequilibrium pruning (r2 < 0.2), 4458 variants were in the PRS which was significant (pseudo-R2 = 0.0018, p = 0.041). 10-fold cross-validation logistic regression modeling of validation set revealed the model had an area under the curve (AUC) value of 0.60 (95% confidence interval [CI] 0.58-0.62) when plotted in a receiver operator curve (ROC). PheWAS identified six phecodes associated with the PRS with the most significant phenotypes being 218 'benign neoplasm of uterus' and 218.1 'uterine leiomyoma' (p = 1.94 × 10-23, OR 1.31 [95% CI 1.26-1.37] and p = 3.50 × 10-23, OR 1.32 [95% CI 1.26-1.37]). We have developed and validated the first PRS for UF. We find our PRS has predictive ability for UF and captures genetic architecture of increased risk for UF that can be used in further studies.
Assuntos
Estudo de Associação Genômica Ampla , Leiomioma , Feminino , Predisposição Genética para Doença , Genômica , Humanos , Leiomioma/genética , Desequilíbrio de Ligação , Fatores de RiscoRESUMO
BACKGROUND: Polycystic ovary syndrome is the most common endocrine disorder affecting women of reproductive age. A number of criteria have been developed for clinical diagnosis of polycystic ovary syndrome, with the Rotterdam criteria being the most inclusive. Evidence suggests that polycystic ovary syndrome is significantly heritable, and previous studies have identified genetic variants associated with polycystic ovary syndrome diagnosed using different criteria. The widely adopted electronic health record system provides an opportunity to identify patients with polycystic ovary syndrome using the Rotterdam criteria for genetic studies. OBJECTIVE: To identify novel associated genetic variants under the same phenotype definition, we extracted polycystic ovary syndrome cases and unaffected controls based on the Rotterdam criteria from the electronic health records and performed a discovery-validation genome-wide association study. STUDY DESIGN: We developed a polycystic ovary syndrome phenotyping algorithm on the basis of the Rotterdam criteria and applied it to 3 electronic health record-linked biobanks to identify cases and controls for genetic study. In the discovery phase, we performed an individual genome-wide association study using the Geisinger MyCode and the Electronic Medical Records and Genomics cohorts, which were then meta-analyzed. We attempted validation of the significant association loci (P<1×10-6) in the BioVU cohort. All association analyses used logistic regression, assuming an additive genetic model, and adjusted for principal components to control for population stratification. An inverse-variance fixed-effect model was adopted for meta-analysis. In addition, we examined the top variants to evaluate their associations with each criterion in the phenotyping algorithm. We used the STRING database to characterize protein-protein interaction network. RESULTS: Using the same algorithm based on the Rotterdam criteria, we identified 2995 patients with polycystic ovary syndrome and 53,599 population controls in total (2742 cases and 51,438 controls from the discovery phase; 253 cases and 2161 controls in the validation phase). We identified 1 novel genome-wide significant variant rs17186366 (odds ratio [OR]=1.37 [1.23, 1.54], P=2.8×10-8) located near SOD2. In addition, 2 loci with suggestive association were also identified: rs113168128 (OR=1.72 [1.42, 2.10], P=5.2×10-8), an intronic variant of ERBB4 that is independent from the previously published variants, and rs144248326 (OR=2.13 [1.52, 2.86], P=8.45×10-7), a novel intronic variant in WWTR1. In the further association tests of the top 3 single-nucleotide polymorphisms with each criterion in the polycystic ovary syndrome algorithm, we found that rs17186366 (SOD2) was associated with polycystic ovaries and hyperandrogenism, whereas rs11316812 (ERBB4) and rs144248326 (WWTR1) were mainly associated with oligomenorrhea or infertility. We also validated the previously reported association with DENND1A1. Using the STRING database to characterize protein-protein interactions, we found both ERBB4 and WWTR1 can interact with YAP1, which has been previously associated with polycystic ovary syndrome. CONCLUSION: Through a discovery-validation genome-wide association study on polycystic ovary syndrome identified from electronic health records using an algorithm based on Rotterdam criteria, we identified and validated a novel genome-wide significant association with a variant near SOD2. We also identified a novel independent variant within ERBB4 and a suggestive association with WWTR1. With previously identified polycystic ovary syndrome gene YAP1, the ERBB4-YAP1-WWTR1 network suggests involvement of the epidermal growth factor receptor and the Hippo pathway in the multifactorial etiology of polycystic ovary syndrome.
Assuntos
Síndrome do Ovário Policístico/genética , Receptor ErbB-4/genética , Transativadores/genética , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Adulto , Estudos de Casos e Controles , Registros Eletrônicos de Saúde , Feminino , Estudo de Associação Genômica Ampla , Humanos , Hiperandrogenismo/genética , Infertilidade Feminina/genética , Pessoa de Meia-Idade , Oligomenorreia/genética , Cistos Ovarianos/genética , Síndrome do Ovário Policístico/diagnóstico , Síndrome do Ovário Policístico/fisiopatologia , Polimorfismo de Nucleotídeo Único , Superóxido Dismutase/genética , Fatores de Transcrição/metabolismo , Proteínas com Motivo de Ligação a PDZ com Coativador Transcricional , Proteínas de Sinalização YAPRESUMO
CONTEXT: As many as 75% of patients with polycystic ovary syndrome (PCOS) are estimated to be unidentified in clinical practice. OBJECTIVE: Utilizing polygenic risk prediction, we aim to identify the phenome-wide comorbidity patterns characteristic of PCOS to improve accurate diagnosis and preventive treatment. DESIGN, PATIENTS, AND METHODS: Leveraging the electronic health records (EHRs) of 124â 852 individuals, we developed a PCOS risk prediction algorithm by combining polygenic risk scores (PRS) with PCOS component phenotypes into a polygenic and phenotypic risk score (PPRS). We evaluated its predictive capability across different ancestries and perform a PRS-based phenome-wide association study (PheWAS) to assess the phenomic expression of the heightened risk of PCOS. RESULTS: The integrated polygenic prediction improved the average performance (pseudo-R2) for PCOS detection by 0.228 (61.5-fold), 0.224 (58.8-fold), 0.211 (57.0-fold) over the null model across European, African, and multi-ancestry participants respectively. The subsequent PRS-powered PheWAS identified a high level of shared biology between PCOS and a range of metabolic and endocrine outcomes, especially with obesity and diabetes: "morbid obesity", "type 2 diabetes", "hypercholesterolemia", "disorders of lipid metabolism", "hypertension", and "sleep apnea" reaching phenome-wide significance. CONCLUSIONS: Our study has expanded the methodological utility of PRS in patient stratification and risk prediction, especially in a multifactorial condition like PCOS, across different genetic origins. By utilizing the individual genome-phenome data available from the EHR, our approach also demonstrates that polygenic prediction by PRS can provide valuable opportunities to discover the pleiotropic phenomic network associated with PCOS pathogenesis.
Assuntos
Algoritmos , Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , Fenômica/métodos , Fenótipo , Síndrome do Ovário Policístico/diagnóstico , Adolescente , Idoso , Estudos de Casos e Controles , Criança , Registros Eletrônicos de Saúde , Feminino , Seguimentos , Predisposição Genética para Doença , Humanos , Pessoa de Meia-Idade , Síndrome do Ovário Policístico/epidemiologia , Síndrome do Ovário Policístico/genética , Prognóstico , Fatores de RiscoRESUMO
Uterine fibroids affect up to 77% of women by menopause and account for up to $34 billion in healthcare costs each year. Although fibroid risk is heritable, genetic risk for fibroids is not well understood. We conducted a two-stage case-control meta-analysis of genetic variants in European and African ancestry women with and without fibroids classified by a previously published algorithm requiring pelvic imaging or confirmed diagnosis. Women from seven electronic Medical Records and Genomics (eMERGE) network sites (3,704 imaging-confirmed cases and 5,591 imaging-confirmed controls) and women of African and European ancestry from UK Biobank (UKB, 5,772 cases and 61,457 controls) were included in the discovery genome-wide association study (GWAS) meta-analysis. Variants showing evidence of association in Stage I GWAS (P < 1 × 10-5) were targeted in an independent replication sample of African and European ancestry individuals from the UKB (Stage II) (12,358 cases and 138,477 controls). Logistic regression models were fit with genetic markers imputed to a 1000 Genomes reference and adjusted for principal components for each race- and site-specific dataset, followed by fixed-effects meta-analysis. Final analysis with 21,804 cases and 205,525 controls identified 326 genome-wide significant variants in 11 loci, with three novel loci at chromosome 1q24 (sentinel-SNP rs14361789; P = 4.7 × 10-8), chromosome 16q12.1 (sentinel-SNP rs4785384; P = 1.5 × 10-9) and chromosome 20q13.1 (sentinel-SNP rs6094982; P = 2.6 × 10-8). Our statistically significant findings further support previously reported loci including SNPs near WT1, TNRC6B, SYNE1, BET1L, and CDC42/WNT4. We report evidence of ancestry-specific findings for sentinel-SNP rs10917151 in the CDC42/WNT4 locus (P = 1.76 × 10-24). Ancestry-specific effect-estimates for rs10917151 were in opposite directions (P-Het-between-groups = 0.04) for predominantly African (OR = 0.84) and predominantly European women (OR = 1.16). Genetically-predicted gene expression of several genes including LUZP1 in vagina (P = 4.6 × 10-8), OBFC1 in esophageal mucosa (P = 8.7 × 10-8), NUDT13 in multiple tissues including subcutaneous adipose tissue (P = 3.3 × 10-6), and HEATR3 in skeletal muscle tissue (P = 5.8 × 10-6) were associated with fibroids. The finding for HEATR3 was supported by SNP-based summary Mendelian randomization analysis. Our study suggests that fibroid risk variants act through regulatory mechanisms affecting gene expression and are comprised of alleles that are both ancestry-specific and shared across continental ancestries.
RESUMO
Resting-state white blood cell (WBC) count is a marker of inflammation and immune system health. There is evidence that WBC count is not fixed over time and there is heterogeneity in WBC trajectory that is associated with morbidity and mortality. Latent class mixed modeling (LCMM) is a method that can identify unobserved heterogeneity in longitudinal data and attempts to classify individuals into groups based on a linear model of repeated measurements. We applied LCMM to repeated WBC count measures derived from electronic medical records of participants of the National Human Genetics Research Institute (NHRGI) electronic MEdical Record and GEnomics (eMERGE) network study, revealing two WBC count trajectory phenotypes. Advancing these phenotypes to GWAS, we found genetic associations between trajectory class membership and regions on chromosome 1p34.3 and chromosome 11q13.4. The chromosome 1 region contains CSF3R, which encodes the granulocyte colony-stimulating factor receptor. This protein is a major factor in neutrophil stimulation and proliferation. The association on chromosome 11 contain genes RNF169 and XRRA1; both involved in the regulation of double-strand break DNA repair.
Assuntos
Contagem de Leucócitos/métodos , Leucócitos/classificação , Adulto , Idoso , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Feminino , Estudo de Associação Genômica Ampla , Humanos , Análise de Classes Latentes , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Proteínas/genética , Receptores de Fator Estimulador de Colônias/genética , Ubiquitina-Proteína Ligases/genéticaRESUMO
BACKGROUND: The MyCode Community Health Initiative (MyCode) is returning actionable results from whole exome sequencing. Familial hypercholesterolemia (FH) is an inherited condition characterized by premature cardiovascular disease. METHODS: We used multiple methods to assess care in 28 MyCode participants who received FH results. Chart reviews were conducted on 23 individuals in the sample and 7 individuals participated semistructured interviews. RESULTS: Chart reviews for 23 individuals with a Geisinger primary care provider found that 4 individuals (17% of 23) were at LDL-C (low-density lipoprotein cholesterol) goal (of either LDL-C <100 mg/dL for primary prevention and LDL-C <70 mg/dL for secondary prevention) and 17 individuals (74% of 23) were prescribed lipid-lowering therapy before genetic result disclosure. After disclosure of the genetic test result, 5 individuals (22% of 23) met their LDL-C goal and 18 individuals (78% of 23) were prescribed lipid-lowering therapy. Follow-up care about this result was not documented for 4 individuals (17% of 23). Changes to intensity of medication management were made for 8 individuals (47% of 17 individuals previously prescribed lipid-lowering therapy). Interviewed individuals (n=7) were not surprised by their result as all knew they had high cholesterol; however, individuals did not seem to discern FH as a separate condition from their high cholesterol. CONCLUSIONS: Among individuals receiving genetic diagnosis of FH, >25% had no changes to lipid-lowering therapy, despite not being at LDL-C goal and learning their high cholesterol is related to a genetic condition requiring more aggressive treatment. Individuals and clinicians may have an inadequate understanding of FH as a distinct condition requiring enhanced medical management.
Assuntos
Atitude Frente a Saúde , Testes Genéticos , Hiperlipoproteinemia Tipo II/diagnóstico , Hiperlipoproteinemia Tipo II/terapia , Aceitação pelo Paciente de Cuidados de Saúde , Percepção , Adulto , Idoso , Idoso de 80 Anos ou mais , Anticolesterolemiantes/uso terapêutico , Apolipoproteína B-100/genética , Estudos de Coortes , Feminino , Humanos , Hiperlipoproteinemia Tipo II/epidemiologia , Hiperlipoproteinemia Tipo II/genética , Masculino , Pessoa de Meia-Idade , Aceitação pelo Paciente de Cuidados de Saúde/psicologia , Aceitação pelo Paciente de Cuidados de Saúde/estatística & dados numéricos , Receptores de LDL/genética , Prevenção Secundária/métodos , Prevenção Secundária/estatística & dados numéricosRESUMO
PurposeArrhythmogenic right ventricular cardiomyopathy (ARVC) is an inherited heart disease. Clinical follow-up of incidental findings in ARVC-associated genes is recommended. We aimed to determine the prevalence of disease thus ascertained.MethodsIndividuals (n = 30,716) underwent exome sequencing. Variants in PKP2, DSG2, DSC2, DSP, JUP, TMEM43, or TGFß3 that were database-listed as pathogenic or likely pathogenic were identified and evidence-reviewed. For subjects with putative loss-of-function (pLOF) variants or variants of uncertain significance (VUS), electronic health records (EHR) were reviewed for ARVC diagnosis, diagnostic criteria, and International Classification of Diseases (ICD-9) codes.ResultsEighteen subjects had pLOF variants; none of these had an EHR diagnosis of ARVC. Of 14 patients with an electrocardiogram, one had a minor diagnostic criterion; the rest were normal. A total of 184 subjects had VUS, none of whom had an ARVC diagnosis. The proportion of subjects with VUS with major (4%) or minor (13%) electrocardiogram diagnostic criteria did not differ from that of variant-negative controls. ICD-9 codes showed no difference in defibrillator use, electrophysiologic abnormalities or nonischemic cardiomyopathies in patients with pLOF or VUSs compared with controls.ConclusionpLOF variants in an unselected cohort were not associated with ARVC phenotypes based on EHR review. The negative predictive value of EHR review remains uncertain.
Assuntos
Displasia Arritmogênica Ventricular Direita/genética , Exoma , Variação Genética , Análise de Sequência de DNA , Adulto , Displasia Arritmogênica Ventricular Direita/epidemiologia , Estudos de Coortes , Registros Eletrônicos de Saúde , Feminino , Estudos de Associação Genética , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , PrevalênciaRESUMO
We performed a Phenome-Wide Association Study (PheWAS) to identify interrelationships between the immune system genetic architecture and a wide array of phenotypes from two de-identified electronic health record (EHR) biorepositories. We selected variants within genes encoding critical factors in the immune system and variants with known associations with autoimmunity. To define case/control status for EHR diagnoses, we used International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes from 3,024 Geisinger Clinic MyCode® subjects (470 diagnoses) and 2,899 Vanderbilt University Medical Center BioVU biorepository subjects (380 diagnoses). A pooled-analysis was also carried out for the replicating results of the two data sets. We identified new associations with potential biological relevance including SNPs in tumor necrosis factor (TNF) and ankyrin-related genes associated with acute and chronic sinusitis and acute respiratory tract infection. The two most significant associations identified were for the C6orf10 SNP rs6910071 and "rheumatoid arthritis" (ICD-9 code category 714) (pMETAL = 2.58 x 10-9) and the ATN1 SNP rs2239167 and "diabetes mellitus, type 2" (ICD-9 code category 250) (pMETAL = 6.39 x 10-9). This study highlights the utility of using PheWAS in conjunction with EHRs to discover new genotypic-phenotypic associations for immune-system related genetic loci.
Assuntos
Estudos de Associação Genética , Sistema Imunitário/metabolismo , Anquirinas/genética , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/patologia , Registros Eletrônicos de Saúde , Loci Gênicos , Genótipo , Humanos , Desequilíbrio de Ligação , Proteínas do Tecido Nervoso/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Infecções Respiratórias/genética , Infecções Respiratórias/patologia , Sinusite/genética , Sinusite/patologia , Fator de Necrose Tumoral alfa/genéticaRESUMO
Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, cataract cases and controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 527,953 and 527,936 single nucleotide polymorphisms (SNPs) for gene-gene and gene-environment analyses, respectively, with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 13 statistically significant SNP-SNP models with an interaction with p-value < 1 × 10(-4), as well as an overall model with p-value < 0.01 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use;these environmental factors have been previously associated with the formation of cataracts. We found a total of 782 gene-environment models that exhibit an interaction with a p-value < 1 × 10(-4) associatedwith cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.
Assuntos
Catarata/genética , Algoritmos , Bancos de Espécimes Biológicos , Estudos de Casos e Controles , Biologia Computacional , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Epistasia Genética , Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , SoftwareRESUMO
BACKGROUND: Effective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner. METHODS: Thus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project. RESULTS: We identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge. CONCLUSIONS: The success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.
RESUMO
Environment-wide association studies (EWAS) provide a way to uncover the environmental mechanisms involved in complex traits in a high-throughput manner. Genome-wide association studies have led to the discovery of genetic variants associated with many common diseases but do not take into account the environmental component of complex phenotypes. This EWAS assesses the comprehensive association between environmental variables and the outcome of type 2 diabetes (T2D) in the Marshfield Personalized Medicine Research Project Biobank (Marshfield PMRP). We sought replication in two National Health and Nutrition Examination Surveys (NHANES). The Marshfield PMRP currently uses four tools for measuring environmental exposures and outcome traits: 1) the PhenX Toolkit includes standardized exposure and phenotypic measures across several domains, 2) the Diet History Questionnaire (DHQ) is a food frequency questionnaire, 3) the Measurement of a Person's Habitual Physical Activity scores the level of an individual's physical activity, and 4) electronic health records (EHR) employs validated algorithms to establish T2D case-control status. Using PLATO software, 314 environmental variables were tested for association with T2D using logistic regression, adjusting for sex, age, and BMI in over 2,200 European Americans. When available, similar variables were tested with the same methods and adjustment in samples from NHANES III and NHANES 1999-2002. Twelve and 31 associations were identified in the Marshfield samples at p<0.01 and p<0.05, respectively. Seven and 13 measures replicated in at least one of the NHANES at p<0.01 and p<0.05, respectively, with the same direction of effect. The most significant environmental exposures associated with T2D status included decreased alcohol use as well as increased smoking exposure in childhood and adulthood. The results demonstrate the utility of the EWAS method and survey tools for identifying environmental components of complex diseases like type 2 diabetes. These high-throughput and comprehensive investigation methods can easily be applied to investigate the relation between environmental exposures and multiple phenotypes in future analyses.
Assuntos
Diabetes Mellitus Tipo 2/etiologia , Meio Ambiente , Bancos de Espécimes Biológicos , Biologia Computacional , Registros de Dieta , Exposição Ambiental , Feminino , Interação Gene-Ambiente , Humanos , Masculino , Atividade Motora , Inquéritos Nutricionais , Fenótipo , Medicina de Precisão , Software , WisconsinRESUMO
BACKGROUND: With the recent decreasing cost of genome sequence data, there has been increasing interest in rare variants and methods to detect their association to disease. We developed BioBin, a flexible collapsing method inspired by biological knowledge that can be used to automate the binning of low frequency variants for association testing. We also built the Library of Knowledge Integration (LOKI), a repository of data assembled from public databases, which contains resources such as: dbSNP and gene Entrez database information from the National Center for Biotechnology (NCBI), pathway information from Gene Ontology (GO), Protein families database (Pfam), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, NetPath - signal transduction pathways, Open Regulatory Annotation Database (ORegAnno), Biological General Repository for Interaction Datasets (BioGrid), Pharmacogenomics Knowledge Base (PharmGKB), Molecular INTeraction database (MINT), and evolutionary conserved regions (ECRs) from UCSC Genome Browser. The novelty of BioBin is access to comprehensive knowledge-guided multi-level binning. For example, bin boundaries can be formed using genomic locations from: functional regions, evolutionary conserved regions, genes, and/or pathways. METHODS: We tested BioBin using simulated data and 1000 Genomes Project low coverage data to test our method with simulated causative variants and a pairwise comparison of rare variant (MAF < 0.03) burden differences between Yoruba individuals (YRI) and individuals of European descent (CEU). Lastly, we analyzed the NHLBI GO Exome Sequencing Project Kabuki dataset, a congenital disorder affecting multiple organs and often intellectual disability, contrasted with Complete Genomics data as controls. RESULTS: The results from our simulation studies indicate type I error rate is controlled, however, power falls quickly for small sample sizes using variants with modest effect sizes. Using BioBin, we were able to find simulated variants in genes with less than 20 loci, but found the sensitivity to be much less in large bins. We also highlighted the scale of population stratification between two 1000 Genomes Project data, CEU and YRI populations. Lastly, we were able to apply BioBin to natural biological data from dbGaP and identify an interesting candidate gene for further study. CONCLUSIONS: We have established that BioBin will be a very practical and flexible tool to analyze sequence data and potentially uncover novel associations between low frequency variants and complex disease.
Assuntos
Anormalidades Múltiplas/genética , Caveolina 2/genética , Biologia Computacional , Variação Genética/genética , Doenças Hematológicas/genética , Fatores de Transcrição Kruppel-Like/genética , Proteínas do Tecido Nervoso/genética , Polidactilia/genética , Software , Doenças Vestibulares/genética , Estudos de Casos e Controles , Simulação por Computador , Bases de Dados Genéticas , Exoma/genética , Face/anormalidades , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica , Humanos , Fenótipo , Ensaios Clínicos Controlados Aleatórios como Assunto , Proteína Gli3 com Dedos de ZincoRESUMO
Using a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers. A total of 83 single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) were genotyped across two or more PAGE study sites. Comprehensive tests of association, stratified by race/ethnicity, were performed, encompassing 4,706 phenotypes mapped to 105 phenotype-classes, and association results were compared across study sites. A total of 111 PheWAS results had significant associations for two or more PAGE study sites with consistent direction of effect with a significance threshold of p<0.01 for the same racial/ethnic group, SNP, and phenotype-class. Among results identified for SNPs previously associated with phenotypes such as lipid traits, type 2 diabetes, and body mass index, 52 replicated previously published genotype-phenotype associations, 26 represented phenotypes closely related to previously known genotype-phenotype associations, and 33 represented potentially novel genotype-phenotype associations with pleiotropic effects. The majority of the potentially novel results were for single PheWAS phenotype-classes, for example, for CDKN2A/B rs1333049 (previously associated with type 2 diabetes in EA) a PheWAS association was identified for hemoglobin levels in AA. Of note, however, GALNT2 rs2144300 (previously associated with high-density lipoprotein cholesterol levels in EA) had multiple potentially novel PheWAS associations, with hypertension related phenotypes in AA and with serum calcium levels and coronary artery disease phenotypes in EA. PheWAS identifies associations for hypothesis generation and exploration of the genetic architecture of complex traits.
Assuntos
Estudos de Associação Genética , Pleiotropia Genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Cálcio/sangue , Doença da Artéria Coronariana/genética , Inibidor p16 de Quinase Dependente de Ciclina/genética , Etnicidade/genética , Redes Reguladoras de Genes , Genômica , Hemoglobinas/genética , Humanos , Hipertensão/genética , N-Acetilgalactosaminiltransferases , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Polipeptídeo N-AcetilgalactosaminiltransferaseRESUMO
Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, 2580 cataract cases and 1367 controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) Biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 529,431 single nucleotide polymorphisms (SNPs) with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using the Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 5 statistically significant models with an interaction term with p-value < 0.05, as well as an overall model with p-value < 0.05 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 288 models that exhibit an interaction term with a p-value ≤ 1×10(-4) associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.
Assuntos
Catarata/etiologia , Catarata/genética , Epistasia Genética , Interação Gene-Ambiente , Idoso , Estudos de Casos e Controles , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Feminino , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Genéticos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , SoftwareRESUMO
Genetic association studies have rapidly become a major tool for identifying the genetic basis of common human diseases. The advent of cost-effective genotyping coupled with large collections of samples linked to clinical outcomes and quantitative traits now make it possible to systematically characterize genotype-phenotype relationships in diverse populations and extensive datasets. To capitalize on these advancements, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project, as part of the collaborative Population Architecture using Genomics and Epidemiology (PAGE) study, accesses two collections: the National Health and Nutrition Examination Surveys (NHANES) and BioVU, Vanderbilt University's biorepository linked to de-identified electronic medical records. We describe herein the workflows for accessing and using the epidemiologic (NHANES) and clinical (BioVU) collections, where each workflow has been customized to reflect the content and data access limitations of each respective source. We also describe the process by which these data are generated, standardized, and shared for meta-analysis among the PAGE study sites. As a specific example of the use of BioVU, we describe the data mining efforts to define cases and controls for genetic association studies of common cancers in PAGE. Collectively, the efforts described here are a generalized outline for many of the successful approaches that can be used in the era of high-throughput genotype-phenotype associations for moving biomedical discovery forward to new frontiers of data generation and analysis.
Assuntos
Interação Gene-Ambiente , Estudos de Associação Genética/estatística & dados numéricos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Genética Populacional/estatística & dados numéricos , Ensaios de Triagem em Larga Escala/estatística & dados numéricos , Humanos , Modelos Lineares , Neoplasias/genética , Inquéritos Nutricionais/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Sistema de Registros/estatística & dados numéricosRESUMO
Skin biopsy gene expression was analyzed by DNA microarray from 13 diffuse cutaneous systemic sclerosis (dSSc) patients enrolled in an open-label study of rituximab, 9 dSSc patients not treated with rituximab, and 9 healthy controls. These data recapitulate the patient "intrinsic" gene expression subsets described previously, including fibroproliferative, inflammatory, and normal-like groups. Serial skin biopsies showed consistent and non-progressing gene expression over time, and importantly, the patients in the inflammatory subset do not move to the fibroproliferative subset, and vice versa. We were unable to detect significant differences in gene expression before and after rituximab treatment, consistent with an apparent lack of clinical response. Serial biopsies from each patient stayed within the same gene expression subset, regardless of treatment regimen or the time point at which they were taken. Collectively, these data emphasize the heterogeneous nature of SSc and demonstrate that the intrinsic subsets are an inherent, reproducible, and stable feature of the disease that is independent of disease duration. Moreover, these data have fundamental importance for the future development of personalized therapy for SSc; drugs targeting inflammation are likely to benefit those patients with an inflammatory signature, whereas drugs targeting fibrosis are likely to benefit those with a fibroproliferative signature.
Assuntos
Expressão Gênica , Esclerodermia Difusa/genética , Esclerodermia Difusa/patologia , Anticorpos Monoclonais Murinos/farmacologia , Anticorpos Monoclonais Murinos/uso terapêutico , Biópsia , Perfilação da Expressão Gênica , Humanos , Imuno-Histoquímica , Fatores Imunológicos/farmacologia , Fatores Imunológicos/uso terapêutico , Análise em Microsséries , RNA Mensageiro/metabolismo , Rituximab , Esclerodermia Difusa/tratamento farmacológico , Fatores de TempoRESUMO
Abnormal fibrillinogenesis is associated with connective tissue disorders (CTDs), including Marfan syndrome (MFS), systemic sclerosis (SSc) and Tight-skin (Tsk) mice. We have previously shown that TGF-beta and Wnt stimulate fibrillin-1 assembly and that fibrillin-1 and the developmental regulator CCN3 are both highly increased in Tsk skin. We investigated the role of CCN3 in abnormal fibrillinogenesis in Tsk mice, MFS, and SSc. Smad3 deletion in Tsk mice decreased CCN3 overexpression, suggesting that TGF-beta mediates at least part of the effect of Tsk fibrillin on CCN3 which is consistent with a synergistic effect of TGF-beta and Wnt in vitro on CCN3 expression. Disruption of fibrillin-1 assembly by MFS fibrillin decreased CCN3 expression and skin from patients with early diffuse SSc showed a strong correlation between increased CCN3 and fibrillin-1 expression, suggesting that CCN3 regulation by fibrillin-1 extends to these CTDs. Diffuse SSc skin and sera also showed evidence of increased Wnt activity, implicating a Wnt stimulus behind this correlation. CCN3 overexpression markedly repressed fibrillin-1 assembly and also blocked other TGFbeta- and Wnt-regulated profibrotic gene expression. Together, these data indicate that CCN3 counter-regulates positive signals from TGF-beta and Wnt for fibrillin fibrillogenesis and profibrotic gene expression.
Assuntos
Síndrome de Marfan/metabolismo , Proteínas dos Microfilamentos/metabolismo , Proteína Sobre-Expressa em Nefroblastoma/metabolismo , Escleroderma Sistêmico/metabolismo , Pele/metabolismo , Fator de Crescimento Transformador beta/metabolismo , Proteínas Wnt/metabolismo , Animais , Biópsia , Proteínas de Sinalização Intercelular CCN , Estudos de Casos e Controles , Células Cultivadas , Modelos Animais de Doenças , Fibrilina-1 , Fibrilinas , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo , Síndrome de Marfan/patologia , Camundongos , Camundongos Mutantes , Proteínas dos Microfilamentos/antagonistas & inibidores , Proteínas Proto-Oncogênicas/metabolismo , Escleroderma Sistêmico/patologia , Transdução de Sinais/fisiologia , Pele/patologia , Proteína Smad3/metabolismo , Fator de Crescimento Transformador beta/antagonistas & inibidores , Proteínas Wnt/antagonistas & inibidoresRESUMO
BACKGROUND: The MYC oncogene contributes to induction and growth of many cancers but the full spectrum of the MYC transcriptional response remains unclear. METHODOLOGY/PRINCIPAL FINDINGS: Using microarrays, we conducted a detailed kinetic study of genes that respond to MYCN or MYCNDeltaMBII induction in primary human fibroblasts. In parallel, we determined the response to steady state overexpression of MYCN and MYCNDeltaMBII in the same cell type. An overlapping set of 398 genes from the two protocols was designated a 'Core MYC Signature' and used for further analysis. Comparison of the Core MYC Signature to a published study of the genes induced by serum stimulation revealed that only 7.4% of the Core MYC Signature genes are in the Core Serum Response and display similar expression changes to both MYC and serum. Furthermore, more than 50% of the Core MYC Signature genes were not influenced by serum stimulation. In contrast, comparison to a panel of breast cancers revealed a strong concordance in gene expression between the Core MYC Signature and the basal-like breast tumor subtype, which is a subtype with poor prognosis. This concordance was supported by the higher average level of MYC expression in the same tumor samples. CONCLUSIONS/SIGNIFICANCE: The Core MYC Signature has clinical relevance as this profile can be used to deduce an underlying genetic program that is likely to contribute to a clinical phenotype. Therefore, the presence of the Core MYC Signature may predict clinical responsiveness to therapeutics that are designed to disrupt MYC-mediated phenotypes.
Assuntos
Neoplasias da Mama/genética , Perfilação da Expressão Gênica , Genes myc , Western Blotting , Neoplasias da Mama/sangue , Neoplasias da Mama/patologia , Células Cultivadas , Feminino , Humanos , Imuno-Histoquímica , Análise de Sequência com Séries de Oligonucleotídeos , PrognósticoRESUMO
BACKGROUND: Scleroderma is a clinically heterogeneous disease with a complex phenotype. The disease is characterized by vascular dysfunction, tissue fibrosis, internal organ dysfunction, and immune dysfunction resulting in autoantibody production. METHODOLOGY AND FINDINGS: We analyzed the genome-wide patterns of gene expression with DNA microarrays in skin biopsies from distinct scleroderma subsets including 17 patients with systemic sclerosis (SSc) with diffuse scleroderma (dSSc), 7 patients with SSc with limited scleroderma (lSSc), 3 patients with morphea, and 6 healthy controls. 61 skin biopsies were analyzed in a total of 75 microarray hybridizations. Analysis by hierarchical clustering demonstrates nearly identical patterns of gene expression in 17 out of 22 of the forearm and back skin pairs of SSc patients. Using this property of the gene expression, we selected a set of 'intrinsic' genes and analyzed the inherent data-driven groupings. Distinct patterns of gene expression separate patients with dSSc from those with lSSc and both are easily distinguished from normal controls. Our data show three distinct patient groups among the patients with dSSc and two groups among patients with lSSc. Each group can be distinguished by unique gene expression signatures indicative of proliferating cells, immune infiltrates and a fibrotic program. The intrinsic groups are statistically significant (p<0.001) and each has been mapped to clinical covariates of modified Rodnan skin score, interstitial lung disease, gastrointestinal involvement, digital ulcers, Raynaud's phenomenon and disease duration. We report a 177-gene signature that is associated with severity of skin disease in dSSc. CONCLUSIONS AND SIGNIFICANCE: Genome-wide gene expression profiling of skin biopsies demonstrates that the heterogeneity in scleroderma can be measured quantitatively with DNA microarrays. The diversity in gene expression demonstrates multiple distinct gene expression programs in the skin of patients with scleroderma.
Assuntos
Regulação da Expressão Gênica , Esclerodermia Difusa/genética , Esclerodermia Difusa/metabolismo , Esclerodermia Limitada/genética , Esclerodermia Limitada/metabolismo , Escleroderma Sistêmico/genética , Escleroderma Sistêmico/metabolismo , Adulto , Idoso , Biópsia , Estudos de Casos e Controles , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Análise de Sequência com Séries de Oligonucleotídeos , FenótipoRESUMO
Elastography based on strain imaging currently endures mechanical artefacts and limited contrast transfer efficiency. Solving the inverse elasticity problem (IEP) should obviate these difficulties; however, this approach to elastography is often fraught with problems because of the ill-posed nature of the IEP. The aim of the present study was to determine how the quality of modulus elastograms computed by solving the IEP compared with those produced using standard strain imaging methodology. Strain-based modulus elastograms (i.e., modulus elastograms computed by simply inverting strain elastograms based on the assumption of stress uniformity) and model-based modulus elastograms (i.e., modulus elastograms computed by solving the IEP) were computed from a common cohort of simulated and gelatin-based phantoms that contained inclusions of varying size and modulus contrast. The ensuing elastograms were evaluated by employing the contrast-to-noise ratio (CNR(e)) and the contrast transfer efficiency (CTE(e)) performance metrics. The results demonstrated that, at a fixed spatial resolution, the CNR(e) of strain-based modulus elastograms was statistically equivalent to those computed by solving the IEP. At low modulus contrast, the CTE(e) of both elastographic imaging approaches was comparable; however, at high modulus, the CTE(e) of model-based modulus elastograms was superior.