RESUMO
Circulating proteins have important functions in inflammation and a broad range of diseases. To identify genetic influences on inflammation-related proteins, we conducted a genome-wide protein quantitative trait locus (pQTL) study of 91 plasma proteins measured using the Olink Target platform in 14,824 participants. We identified 180 pQTLs (59 cis, 121 trans). Integration of pQTL data with eQTL and disease genome-wide association studies provided insight into pathogenesis, implicating lymphotoxin-α in multiple sclerosis. Using Mendelian randomization (MR) to assess causality in disease etiology, we identified both shared and distinct effects of specific proteins across immune-mediated diseases, including directionally discordant effects of CD40 on risk of rheumatoid arthritis versus multiple sclerosis and inflammatory bowel disease. MR implicated CXCL5 in the etiology of ulcerative colitis (UC) and we show elevated gut CXCL5 transcript expression in patients with UC. These results identify targets of existing drugs and provide a powerful resource to facilitate future drug target prioritization.
Assuntos
Colite Ulcerativa , Doenças Inflamatórias Intestinais , Esclerose Múltipla , Humanos , Estudo de Associação Genômica Ampla , Doenças Inflamatórias Intestinais/genética , Locos de Características Quantitativas , Colite Ulcerativa/tratamento farmacológico , Colite Ulcerativa/genética , Inflamação/genética , Esclerose Múltipla/genética , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Integrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers and discover drug targets1-4. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to the plasma proteome remains largely unknown. Here we identify associations between rare protein-coding variants and 2,923 plasma protein abundances measured in 49,736 UK Biobank individuals. Our variant-level exome-wide association study identified 5,433 rare genotype-protein associations, of which 81% were undetected in a previous genome-wide association study of the same cohort5. We then looked at aggregate signals using gene-level collapsing analysis, which revealed 1,962 gene-protein associations. Of the 691 gene-level signals from protein-truncating variants, 99.4% were associated with decreased protein levels. STAB1 and STAB2, encoding scavenger receptors involved in plasma protein clearance, emerged as pleiotropic loci, with 77 and 41 protein associations, respectively. We demonstrate the utility of our publicly accessible resource through several applications. These include detailing an allelic series in NLRC4, identifying potential biomarkers for a fatty liver disease-associated variant in HSD17B13 and bolstering phenome-wide association studies by integrating protein quantitative trait loci with protein-truncating variants in collapsing analyses. Finally, we uncover distinct proteomic consequences of clonal haematopoiesis (CH), including an association between TET2-CH and increased FLT3 levels. Our results highlight a considerable role for rare variation in plasma protein abundance and the value of proteogenomics in therapeutic discovery.
Assuntos
Bancos de Espécimes Biológicos , Proteínas Sanguíneas , Estudos de Associação Genética , Genômica , Proteômica , Humanos , Alelos , Biomarcadores/sangue , Proteínas Sanguíneas/análise , Proteínas Sanguíneas/genética , Bases de Dados Factuais , Exoma/genética , Hematopoese , Mutação , Plasma/química , Reino UnidoRESUMO
The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics1. Here we examine a large cohort (the INTERVAL study2; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank3 to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK-STAT signalling and coronary atherosclerosis. Finally, we develop a portal ( https://www.omicspred.org/ ) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores.
Assuntos
Doença da Artéria Coronariana , Multiômica , Humanos , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/metabolismo , Metabolômica/métodos , Fenótipo , Proteômica/métodos , Aprendizado de Máquina , Negro ou Afro-Americano/genética , Asiático/genética , População Europeia/genética , Reino Unido , Conjuntos de Dados como Assunto , Internet , Reprodutibilidade dos Testes , Estudos de Coortes , Proteoma/análise , Proteoma/metabolismo , Metaboloma , Plasma/metabolismo , Bases de Dados FactuaisRESUMO
Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.
Assuntos
Regulação da Expressão Gênica , Humanos , Análise de Sequência de RNA , Variação Genética , Variação Estrutural do Genoma/genética , Transcriptoma/genética , Doadores de SangueRESUMO
Genome-wide association studies (GWASs) have established the contribution of common and low-frequency variants to metabolic blood measurements in the UK Biobank (UKB). To complement existing GWAS findings, we assessed the contribution of rare protein-coding variants in relation to 355 metabolic blood measurements-including 325 predominantly lipid-related nuclear magnetic resonance (NMR)-derived blood metabolite measurements (Nightingale Health Plc) and 30 clinical blood biomarkers-using 412,393 exome sequences from four genetically diverse ancestries in the UKB. Gene-level collapsing analyses were conducted to evaluate a diverse range of rare-variant architectures for the metabolic blood measurements. Altogether, we identified significant associations (p < 1 × 10-8) for 205 distinct genes that involved 1,968 significant relationships for the Nightingale blood metabolite measurements and 331 for the clinical blood biomarkers. These include associations for rare non-synonymous variants in PLIN1 and CREB3L3 with lipid metabolite measurements and SYT7 with creatinine, among others, which may not only provide insights into novel biology but also deepen our understanding of established disease mechanisms. Of the study-wide significant clinical biomarker associations, 40% were not previously detected on analyzing coding variants in a GWAS in the same cohort, reinforcing the importance of studying rare variation to fully understand the genetic architecture of metabolic blood measurements.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Bancos de Espécimes Biológicos , Biomarcadores , Lipídeos , Reino Unido , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Understanding the genetic basis of human diseases has become integral to drug development and precision medicine. Recent advancements have enabled the identification of molecular pathways driving diseases, leading to targeted treatment strategies. The increasing investment in rare diseases by the biotech industry underscores the importance of genetic evidence in drug discovery and approval processes. Here we studied a monogenic Mendelian kidney disease, TRPC6-associated podocytopathy (TRPC6-AP), to present its natural history, genetic spectrum, and clinicopathological associations in a large cohort of patients with causal variants in TRPC6, in order to help define the specific features of disease and further facilitate drug development and clinical trials design. METHODS: the study involved 64 individuals from 39 families with TRPC6 causal missense variants. Clinical data, including age of onset, laboratory results, response to treatment, kidney biopsy findings, and genetic information, were collected from multiple centers nationally and internationally. Exome or targeted sequencing was performed and variant classification was based on strict criteria. Structural and functional analyses of TRPC6 variants were conducted to understand their impact on protein function. In depth re-analysis of light and electron microscopy specimens for 9 available kidney biopsies was conducted to identify pathological features and correlates of TRPC6-AP. RESULTS: Large-scale sequencing data did not support causality for TRPC6 protein-truncating variants. We identified 21 unique TRPC6 missense variants, clustering in three distinct regions of the protein, and with different effects on TRPC6 3D protein structure. Kidney biopsy analysis revealed FSGS patterns of injury in most cases, along with distinctive podocyte features including diffuse foot process effacement and swollen cell bodies. The majority of patients presented in adolescence or early adulthood but with ample variation (average 22, SD ± 14 years), with frequent progression to kidney failure but with variability in time between presentation and ESKD. CONCLUSIONS: This study provides insights into the genetic spectrum, clinicopathological associations, and natural history of TRPC6-AP.
RESUMO
Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.
Assuntos
Proteínas Sanguíneas/genética , Genômica , Proteoma/genética , Feminino , Fator de Crescimento de Hepatócito/genética , Humanos , Doenças Inflamatórias Intestinais/genética , Masculino , Mutação de Sentido Incorreto/genética , Mieloblastina/genética , Fator 1 de Ligação ao Domínio I Regulador Positivo/genética , Proteínas Proto-Oncogênicas/genética , Locos de Características Quantitativas/genética , Vasculite/genética , alfa 1-Antitripsina/genéticaRESUMO
Mosaic mutations present in the germline have important implications for reproductive risk and disease transmission. We previously demonstrated a phenomenon occurring in the male germline, whereby specific mutations arising spontaneously in stem cells (spermatogonia) lead to clonal expansion, resulting in elevated mutation levels in sperm over time. This process, termed "selfish spermatogonial selection," explains the high spontaneous birth prevalence and strong paternal age-effect of disorders such as achondroplasia and Apert, Noonan and Costello syndromes, with direct experimental evidence currently available for specific positions of six genes (FGFR2, FGFR3, RET, PTPN11, HRAS, and KRAS). We present a discovery screen to identify novel mutations and genes showing evidence of positive selection in the male germline, by performing massively parallel simplex PCR using RainDance technology to interrogate mutational hotspots in 67 genes (51.5 kb in total) in 276 biopsies of testes from five men (median age, 83 yr). Following ultradeep sequencing (about 16,000×), development of a low-frequency variant prioritization strategy, and targeted validation, we identified 61 distinct variants present at frequencies as low as 0.06%, including 54 variants not previously directly associated with selfish selection. The majority (80%) of variants identified have previously been implicated in developmental disorders and/or oncogenesis and include mutations in six newly associated genes (BRAF, CBL, MAP2K1, MAP2K2, RAF1, and SOS1), all of which encode components of the RAS-MAPK pathway and activate signaling. Our findings extend the link between mutations dysregulating the RAS-MAPK pathway and selfish selection, and show that the aging male germline is a repository for such deleterious mutations.
Assuntos
Proteínas Quinases Ativadas por Mitógeno/metabolismo , Mutação , Transdução de Sinais , Testículo/metabolismo , Proteínas ras/metabolismo , Idoso , Idoso de 80 Anos ou mais , Variação Genética , Humanos , Masculino , Pessoa de Meia-IdadeRESUMO
BACKGROUND: Genetic, lifestyle, and environmental factors can lead to perturbations in circulating lipid levels and increase the risk of cardiovascular and metabolic diseases. However, how changes in individual lipid species contribute to disease risk is often unclear. Moreover, little is known about the role of lipids on cardiovascular disease in Pakistan, a population historically underrepresented in cardiovascular studies. METHODS: We characterised the genetic architecture of the human blood lipidome in 5662 hospital controls from the Pakistan Risk of Myocardial Infarction Study (PROMIS) and 13,814 healthy British blood donors from the INTERVAL study. We applied a candidate causal gene prioritisation tool to link the genetic variants associated with each lipid to the most likely causal genes, and Gaussian Graphical Modelling network analysis to identify and illustrate relationships between lipids and genetic loci. RESULTS: We identified 253 genetic associations with 181 lipids measured using direct infusion high-resolution mass spectrometry in PROMIS, and 502 genetic associations with 244 lipids in INTERVAL. Our analyses revealed new biological insights at genetic loci associated with cardiometabolic diseases, including novel lipid associations at the LPL, MBOAT7, LIPC, APOE-C1-C2-C4, SGPP1, and SPTLC3 loci. CONCLUSIONS: Our findings, generated using a distinctive lipidomics platform in an understudied South Asian population, strengthen and expand the knowledge base of the genetic determinants of lipids and their association with cardiometabolic disease-related loci.
Assuntos
Estudo de Associação Genômica Ampla , Infarto do Miocárdio , Povo Asiático/genética , Predisposição Genética para Doença , Humanos , Lipídeos , Polimorfismo de Nucleotídeo Único , População BrancaRESUMO
Quantitative trait locus (QTL) mapping of molecular phenotypes such as metabolites, lipids and proteins through genome-wide association studies represents a powerful means of highlighting molecular mechanisms relevant to human diseases. However, a major challenge of this approach is to identify the causal gene(s) at the observed QTLs. Here, we present a framework for the 'Prioritization of candidate causal Genes at Molecular QTLs' (ProGeM), which incorporates biological domain-specific annotation data alongside genome annotation data from multiple repositories. We assessed the performance of ProGeM using a reference set of 227 previously reported and extensively curated metabolite QTLs. For 98% of these loci, the expert-curated gene was one of the candidate causal genes prioritized by ProGeM. Benchmarking analyses revealed that 69% of the causal candidates were nearest to the sentinel variant at the investigated molecular QTLs, indicating that genomic proximity is the most reliable indicator of 'true positive' causal genes. In contrast, cis-gene expression QTL data led to three false positive candidate causal gene assignments for every one true positive assignment. We provide evidence that these conclusions also apply to other molecular phenotypes, suggesting that ProGeM is a powerful and versatile tool for annotating molecular QTLs. ProGeM is freely available via GitHub.
Assuntos
Estudos de Associação Genética , Estudo de Associação Genômica Ampla/métodos , Anotação de Sequência Molecular/métodos , Locos de Características Quantitativas/genética , Mapeamento Cromossômico/métodos , Humanos , Lipídeos/genética , Fenótipo , Proteínas/genéticaRESUMO
Epigenetic and transcriptional variability contribute to the vast diversity of cellular and organismal phenotypes and are key in human health and disease. In this review, we describe different types, sources, and determinants of epigenetic and transcriptional variability, enabling cells and organisms to adapt and evolve to a changing environment. We highlight the latest research and hypotheses on how chromatin structure and the epigenome influence gene expression variability. Further, we provide an overview of challenges in the analysis of biological variability. An improved understanding of the molecular mechanisms underlying epigenetic and transcriptional variability, at both the intra- and inter-individual level, provides great opportunity for disease prevention, better therapeutic approaches, and personalized medicine.
Assuntos
Adaptação Fisiológica/genética , Variação Biológica da População/genética , Epigênese Genética , Variação Genética , Transcrição Gênica , Variação Biológica Individual , Cromatina/genética , Humanos , Medicina de PrecisãoRESUMO
Direct infusion high-resolution mass spectrometry (DIHRMS) is a novel, high-throughput approach to rapidly and accurately profile hundreds of lipids in human serum without prior chromatography, facilitating in-depth lipid phenotyping for large epidemiological studies to reveal the detailed associations of individual lipids with coronary heart disease (CHD) risk factors. Intact lipid profiling by DIHRMS was performed on 5662 serum samples from healthy participants in the Pakistan Risk of Myocardial Infarction Study (PROMIS). We developed a novel semi-targeted peak-picking algorithm to detect mass-to-charge ratios in positive and negative ionization modes. We analyzed lipid partial correlations, assessed the association of lipid principal components with established CHD risk factors and genetic variants, and examined differences between lipids for a common genetic polymorphism. The DIHRMS method provided information on 360 lipids (including fatty acyls, glycerolipids, glycerophospholipids, sphingolipids, and sterol lipids), with a median coefficient of variation of 11.6% (range: 5.4-51.9). The lipids were highly correlated and exhibited a range of associations with clinical chemistry biomarkers and lifestyle factors. This platform can provide many novel insights into the effects of physiology and lifestyle on lipid metabolism, genetic determinants of lipids, and the relationship between individual lipids and CHD risk factors.
Assuntos
Biomarcadores/sangue , Doença das Coronárias/genética , Lipídeos/genética , Doença das Coronárias/sangue , Doença das Coronárias/patologia , Feminino , Variação Genética , Glicerofosfolipídeos/sangue , Humanos , Metabolismo dos Lipídeos/genética , Lipídeos/sangue , Masculino , Pessoa de Meia-Idade , Fatores de Risco , Esfingolipídeos/sangue , Esfingolipídeos/genética , Esteróis/sangueRESUMO
The Asp358Ala variant in the interleukin-6 receptor (IL-6R) gene has been implicated in asthma, autoimmune and cardiovascular disorders, but its role in other respiratory conditions such as chronic obstructive pulmonary disease (COPD) has not been investigated. The aims of this study were to evaluate whether there is an association between Asp358Ala and COPD or asthma risk, and to explore the role of the Asp358Ala variant in sIL-6R shedding from neutrophils and its pro-inflammatory effects in the lung. We undertook logistic regression using data from the UK Biobank and the ECLIPSE COPD cohort. Results were meta-analyzed with summary data from a further three COPD cohorts (7,519 total cases and 35,653 total controls), showing no association between Asp358Ala and COPD (OR = 1.02 [95% CI: 0.96, 1.07]). Data from the UK Biobank showed a positive association between the Asp358Ala variant and atopic asthma (OR = 1.07 [1.01, 1.13]). In a series of in vitro studies using blood samples from 37 participants, we found that shedding of sIL-6R from neutrophils was greater in carriers of the Asp358Ala minor allele than in non-carriers. Human pulmonary artery endothelial cells cultured with serum from homozygous carriers showed an increase in MCP-1 release in carriers of the minor allele, with the difference eliminated upon addition of tocilizumab. In conclusion, there is evidence that neutrophils may be an important source of sIL-6R in the lungs, and the Asp358Ala variant may have pro-inflammatory effects in lung cells. However, we were unable to identify evidence for an association between Asp358Ala and COPD.
Assuntos
Asma/genética , Estudos de Associação Genética , Doença Pulmonar Obstrutiva Crônica/genética , Receptores de Interleucina-6/genética , Asma/sangue , Asma/patologia , Feminino , Humanos , Pulmão/metabolismo , Pulmão/patologia , Masculino , Neutrófilos/metabolismo , Neutrófilos/patologia , Doença Pulmonar Obstrutiva Crônica/sangue , Doença Pulmonar Obstrutiva Crônica/patologiaRESUMO
BACKGROUND: Low-risk limits recommended for alcohol consumption vary substantially across different national guidelines. To define thresholds associated with lowest risk for all-cause mortality and cardiovascular disease, we studied individual-participant data from 599â912 current drinkers without previous cardiovascular disease. METHODS: We did a combined analysis of individual-participant data from three large-scale data sources in 19 high-income countries (the Emerging Risk Factors Collaboration, EPIC-CVD, and the UK Biobank). We characterised dose-response associations and calculated hazard ratios (HRs) per 100 g per week of alcohol (12·5 units per week) across 83 prospective studies, adjusting at least for study or centre, age, sex, smoking, and diabetes. To be eligible for the analysis, participants had to have information recorded about their alcohol consumption amount and status (ie, non-drinker vs current drinker), plus age, sex, history of diabetes and smoking status, at least 1 year of follow-up after baseline, and no baseline history of cardiovascular disease. The main analyses focused on current drinkers, whose baseline alcohol consumption was categorised into eight predefined groups according to the amount in grams consumed per week. We assessed alcohol consumption in relation to all-cause mortality, total cardiovascular disease, and several cardiovascular disease subtypes. We corrected HRs for estimated long-term variability in alcohol consumption using 152â640 serial alcohol assessments obtained some years apart (median interval 5·6 years [5th-95th percentile 1·04-13·5]) from 71â011 participants from 37 studies. FINDINGS: In the 599â912 current drinkers included in the analysis, we recorded 40â310 deaths and 39â018 incident cardiovascular disease events during 5·4 million person-years of follow-up. For all-cause mortality, we recorded a positive and curvilinear association with the level of alcohol consumption, with the minimum mortality risk around or below 100 g per week. Alcohol consumption was roughly linearly associated with a higher risk of stroke (HR per 100 g per week higher consumption 1·14, 95% CI, 1·10-1·17), coronary disease excluding myocardial infarction (1·06, 1·00-1·11), heart failure (1·09, 1·03-1·15), fatal hypertensive disease (1·24, 1·15-1·33); and fatal aortic aneurysm (1·15, 1·03-1·28). By contrast, increased alcohol consumption was log-linearly associated with a lower risk of myocardial infarction (HR 0·94, 0·91-0·97). In comparison to those who reported drinking >0-≤100 g per week, those who reported drinking >100-≤200 g per week, >200-≤350 g per week, or >350 g per week had lower life expectancy at age 40 years of approximately 6 months, 1-2 years, or 4-5 years, respectively. INTERPRETATION: In current drinkers of alcohol in high-income countries, the threshold for lowest risk of all-cause mortality was about 100 g/week. For cardiovascular disease subtypes other than myocardial infarction, there were no clear risk thresholds below which lower alcohol consumption stopped being associated with lower disease risk. These data support limits for alcohol consumption that are lower than those recommended in most current guidelines. FUNDING: UK Medical Research Council, British Heart Foundation, National Institute for Health Research, European Union Framework 7, and European Research Council.
Assuntos
Consumo de Bebidas Alcoólicas/efeitos adversos , Consumo de Bebidas Alcoólicas/mortalidade , Doenças Cardiovasculares/etiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos ProspectivosRESUMO
Anaemia is a chief determinant of global ill health, contributing to cognitive impairment, growth retardation and impaired physical capacity. To understand further the genetic factors influencing red blood cells, we carried out a genome-wide association study of haemoglobin concentration and related parameters in up to 135,367 individuals. Here we identify 75 independent genetic loci associated with one or more red blood cell phenotypes at P < 10(-8), which together explain 4-9% of the phenotypic variance per trait. Using expression quantitative trait loci and bioinformatic strategies, we identify 121 candidate genes enriched in functions relevant to red blood cell biology. The candidate genes are expressed preferentially in red blood cell precursors, and 43 have haematopoietic phenotypes in Mus musculus or Drosophila melanogaster. Through open-chromatin and coding-variant analyses we identify potential causal genetic variants at 41 loci. Our findings provide extensive new insights into genetic mechanisms and biological pathways controlling red blood cell formation and function.
Assuntos
Eritrócitos/metabolismo , Loci Gênicos , Estudo de Associação Genômica Ampla , Fenótipo , Animais , Ciclo Celular/genética , Citocinas/metabolismo , Drosophila melanogaster/genética , Eritrócitos/citologia , Feminino , Regulação da Expressão Gênica/genética , Hematopoese/genética , Hemoglobinas/genética , Humanos , Masculino , Camundongos , Especificidade de Órgãos , Polimorfismo de Nucleotídeo Único/genética , Interferência de RNA , Transdução de Sinais/genéticaRESUMO
PhenoScanner is a curated database of publicly available results from large-scale genetic association studies. This tool aims to facilitate 'phenome scans', the cross-referencing of genetic variants with many phenotypes, to help aid understanding of disease pathways and biology. The database currently contains over 350 million association results and over 10 million unique genetic variants, mostly single nucleotide polymorphisms. It is accompanied by a web-based tool that queries the database for associations with user-specified variants, providing results according to the same effect and non-effect alleles for each input variant. The tool provides the option of searching for trait associations with proxies of the input variants, calculated using the European samples from 1000 Genomes and Hapmap. AVAILABILITY AND IMPLEMENTATION: PhenoScanner is available at www.phenoscanner.medschl.cam.ac.uk CONTACT: jrs95@medschl.cam.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Bases de Dados Factuais , Estudos de Associação Genética , Variação Genética , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , SoftwareRESUMO
Nearly three-quarters of the 143 genetic signals associated with platelet and erythrocyte phenotypes identified by meta-analyses of genome-wide association (GWA) studies are located at non-protein-coding regions. Here, we assessed the role of candidate regulatory variants associated with cell type-restricted, closely related hematological quantitative traits in biologically relevant hematopoietic cell types. We used formaldehyde-assisted isolation of regulatory elements followed by next-generation sequencing (FAIRE-seq) to map regions of open chromatin in three primary human blood cells of the myeloid lineage. In the precursors of platelets and erythrocytes, as well as in monocytes, we found that open chromatin signatures reflect the corresponding hematopoietic lineages of the studied cell types and associate with the cell type-specific gene expression patterns. Dependent on their signal strength, open chromatin regions showed correlation with promoter and enhancer histone marks, distance to the transcription start site, and ontology classes of nearby genes. Cell type-restricted regions of open chromatin were enriched in sequence variants associated with hematological indices. The majority (63.6%) of such candidate functional variants at platelet quantitative trait loci (QTLs) coincided with binding sites of five transcription factors key in regulating megakaryopoiesis. We experimentally tested 13 candidate regulatory variants at 10 platelet QTLs and found that 10 (76.9%) affected protein binding, suggesting that this is a frequent mechanism by which regulatory variants influence quantitative trait levels. Our findings demonstrate that combining large-scale GWA data with open chromatin profiles of relevant cell types can be a powerful means of dissecting the genetic architecture of closely related quantitative traits.
Assuntos
Montagem e Desmontagem da Cromatina , Cromatina/metabolismo , Variação Genética , Locos de Características Quantitativas , Característica Quantitativa Herdável , Sequências Reguladoras de Ácido Nucleico , Plaquetas/metabolismo , Linhagem da Célula/genética , Mapeamento Cromossômico , Análise por Conglomerados , Eritrócitos/metabolismo , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Histonas/metabolismo , Humanos , Células Mieloides/metabolismo , Nucleossomos/metabolismo , Especificidade de Órgãos/genética , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Understanding the functional mechanisms underlying genetic signals associated with complex traits and common diseases, such as cancer, diabetes and Alzheimer's disease, is a formidable challenge. Many genetic signals discovered through genome-wide association studies map to non-protein coding sequences, where their molecular consequences are difficult to evaluate. This article summarizes concepts for the systematic interpretation of non-coding genetic signals using genome annotation data sets in different cellular systems. We outline strategies for the global analysis of multiple association intervals and the in-depth molecular investigation of individual intervals. We highlight experimental techniques to validate candidate (potential causal) regulatory variants, with a focus on novel genome-editing techniques including CRISPR/Cas9. These approaches are also applicable to low-frequency and rare variants, which have become increasingly important in genomic studies of complex traits and diseases. There is a pressing need to translate genetic signals into biological mechanisms, leading to prognostic, diagnostic and therapeutic advances.
Assuntos
Variação Genética/genética , Biologia Computacional , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , HumanosRESUMO
We recently identified 68 genomic loci where common sequence variants are associated with platelet count and volume. Platelets are formed in the bone marrow by megakaryocytes, which are derived from hematopoietic stem cells by a process mainly controlled by transcription factors. The homeobox transcription factor MEIS1 is uniquely transcribed in megakaryocytes and not in the other lineage-committed blood cells. By ChIP-seq, we show that 5 of the 68 loci pinpoint a MEIS1 binding event within a group of 252 MK-overexpressed genes. In one such locus in DNM3, regulating platelet volume, the MEIS1 binding site falls within a region acting as an alternative promoter that is solely used in megakaryocytes, where allelic variation dictates different levels of a shorter transcript. The importance of dynamin activity to the latter stages of thrombopoiesis was confirmed by the observation that the inhibitor Dynasore reduced murine proplatelet for-mation in vitro.