RESUMO
Gene regulatory networks (GRNs) govern many core developmental and biological processes underlying human complex traits. Even with broad-scale efforts to characterize the effects of molecular perturbations and interpret gene coexpression, it remains challenging to infer the architecture of gene regulation in a precise and efficient manner. Key properties of GRNs, like hierarchical structure, modular organization, and sparsity, provide both challenges and opportunities for this objective. Here, we seek to better understand properties of GRNs using a new approach to simulate their structure and model their function. We produce realistic network structures with a novel generating algorithm based on insights from small-world network theory, and we model gene expression regulation using stochastic differential equations formulated to accommodate modeling molecular perturbations. With these tools, we systematically describe the effects of gene knockouts within and across GRNs, finding a subset of networks that recapitulate features of a recent genome-scale perturbation study. With deeper analysis of these exemplar networks, we consider future avenues to map the architecture of gene expression regulation using data from cells in perturbed and unperturbed states, finding that while perturbation data are critical to discover specific regulatory interactions, data from unperturbed cells may be sufficient to reveal regulatory programs.
RESUMO
Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Predictability, Computability, Stability (PCS) framework, we introduce the epiTree pipeline to extract higher-order interactions from genomic data using tree-based models. The epiTree pipeline first selects a set of variants derived from tissue-specific estimates of gene expression. Next, it uses iterative random forests (iRF) to search training data for candidate Boolean interactions (pairwise and higher-order). We derive significance tests for interactions, based on a stabilized likelihood ratio test, by simulating Boolean tree-structured null (no epistasis) and alternative (epistasis) distributions on hold-out test data. Finally, our pipeline computes PCS epistasis p-values that probabilisticly quantify improvement in prediction accuracy via bootstrap sampling on the test set. We validate the epiTree pipeline in two case studies using data from the UK Biobank: predicting red hair and multiple sclerosis (MS). In the case of predicting red hair, epiTree recovers known epistatic interactions surrounding MC1R and novel interactions, representing non-linearities not captured by logistic regression models. In the case of predicting MS, a more complex phenotype than red hair, epiTree rankings prioritize novel interactions surrounding HLA-DRB1, a variant previously associated with MS in several populations. Taken together, these results highlight the potential for epiTree rankings to help reduce the design space for follow up experiments.
Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Herança Multifatorial/genética , Modelos Logísticos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Mapping the functional human genome and impact of genetic variants is often limited to European-descendent population samples. To aid in overcoming this limitation, we measured gene expression using RNA sequencing in lymphoblastoid cell lines (LCLs) from 599 individuals from six African populations to identify novel transcripts including those not represented in the hg38 reference genome. We used whole genomes from the 1000 Genomes Project and 164 Maasai individuals to identify 8,881 expression and 6,949 splicing quantitative trait loci (eQTLs/sQTLs), and 2,611 structural variants associated with gene expression (SV-eQTLs). We further profiled chromatin accessibility using ATAC-Seq in a subset of 100 representative individuals, to identity chromatin accessibility quantitative trait loci (caQTLs) and allele-specific chromatin accessibility, and provide predictions for the functional effect of 78.9 million variants on chromatin accessibility. Using this map of eQTLs and caQTLs we fine-mapped GWAS signals for a range of complex diseases. Combined, this work expands global functional genomic data to identify novel transcripts, functional elements and variants, understand population genetic history of molecular quantitative trait loci, and further resolve the genetic basis of multiple human traits and disease.
RESUMO
BACKGROUND: Congenital heart disease (CHD) is highly heritable, but the power to identify inherited risk has been limited to analyses of common variants in small cohorts. METHODS: We performed reimputation of 4 CHD cohorts (n=55 342) to the TOPMed reference panel (freeze 5), permitting meta-analysis of 14 784 017 variants including 6 035 962 rare variants of high imputation quality as validated by whole genome sequencing. RESULTS: Meta-analysis identified 16 novel loci, including 12 rare variants, which displayed moderate or large effect sizes (median odds ratio, 3.02) for 4 separate CHD categories. Analyses of chromatin structure link 13 of the genome-wide significant loci to key genes in cardiac development; rs373447426 (minor allele frequency, 0.003 [odds ratio, 3.37 for Conotruncal heart disease]; P=1.49×10-8) is predicted to disrupt chromatin structure for 2 nearby genes BDH1 and DLG1 involved in Conotruncal development. A lead variant rs189203952 (minor allele frequency, 0.01 [odds ratio, 2.4 for left ventricular outflow tract obstruction]; P=1.46×10-8) is predicted to disrupt the binding sites of 4 transcription factors known to participate in cardiac development in the promoter of SPAG9. A tissue-specific model of chromatin conformation suggests that common variant rs78256848 (minor allele frequency, 0.11 [odds ratio, 1.4 for Conotruncal heart disease]; P=2.6×10-8) physically interacts with NCAM1 (PFDR=1.86×10-27), a neural adhesion molecule acting in cardiac development. Importantly, while each individual malformation displayed substantial heritability (observed h2 ranging from 0.26 for complex malformations to 0.37 for left ventricular outflow tract obstructive disease) the risk for different CHD malformations appeared to be separate, without genetic correlation measured by linkage disequilibrium score regression or regional colocalization. CONCLUSIONS: We describe a set of rare noncoding variants conferring significant risk for individual heart malformations which are linked to genes governing cardiac development. These results illustrate that the oligogenic basis of CHD and significant heritability may be linked to rare variants outside protein-coding regions conferring substantial risk for individual categories of cardiac malformation.
Assuntos
Cardiopatias Congênitas , Humanos , Cardiopatias Congênitas/diagnóstico , Cardiopatias Congênitas/genética , Fenótipo , Frequência do Gene , Sequenciamento Completo do Genoma , Cromatina , Proteínas Adaptadoras de Transdução de Sinal/genéticaRESUMO
BACKGROUND: A large proportion of genetic risk remains unexplained for structural heart disease involving the interventricular septum (IVS) including hypertrophic cardiomyopathy and ventricular septal defects. This study sought to develop a reproducible proxy of IVS structure from standard medical imaging, discover novel genetic determinants of IVS structure, and relate these loci to diseases of the IVS, hypertrophic cardiomyopathy, and ventricular septal defect. METHODS: We estimated the cross-sectional area of the IVS from the 4-chamber view of cardiac magnetic resonance imaging in 32 219 individuals from the UK Biobank which was used as the basis of genome wide association studies and Mendelian randomization. RESULTS: Measures of IVS cross-sectional area at diastole were a strong proxy for the 3-dimensional volume of the IVS (Pearson r=0.814, P=0.004), and correlated with anthropometric measures, blood pressure, and diagnostic codes related to cardiovascular physiology. Seven loci with clear genomic consequence and relevance to cardiovascular biology were uncovered by genome wide association studies, most notably a single nucleotide polymorphism in an intron of CDKN1A (rs2376620; ß, 7.7 mm2 [95% CI, 5.8-11.0]; P=6.0×10-10), and a common inversion incorporating KANSL1 predicted to disrupt local chromatin structure (ß, 8.4 mm2 [95% CI, 6.3-10.9]; P=4.2×10-14). Mendelian randomization suggested that inheritance of larger IVS cross-sectional area at diastole was strongly associated with hypertrophic cardiomyopathy risk (pIVW=4.6×10-10) while inheritance of smaller IVS cross-sectional area at diastole was associated with risk for ventricular septal defect (pIVW=0.007). CONCLUSIONS: Automated estimates of cross-sectional area of the IVS supports discovery of novel loci related to cardiac development and Mendelian disease. Inheritance of genetic liability for either small or large IVS, appears to confer risk for ventricular septal defect or hypertrophic cardiomyopathy, respectively. These data suggest that a proportion of risk for structural and congenital heart disease can be localized to the common genetic determinants of size and shape of cardiovascular anatomy.
Assuntos
Cardiomiopatia Hipertrófica , Comunicação Interventricular , Humanos , Estudo de Associação Genômica Ampla , Cardiomiopatia Hipertrófica/diagnóstico por imagem , Cardiomiopatia Hipertrófica/genética , Cardiomiopatia Hipertrófica/complicações , Comunicação Interventricular/diagnóstico por imagem , Comunicação Interventricular/genética , Comunicação Interventricular/complicações , Coração , Imageamento por Ressonância MagnéticaRESUMO
Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.
Assuntos
Herança Multifatorial , Obesidade , Índice de Massa Corporal , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Herança Multifatorial/genética , Obesidade/genética , Fenótipo , Fatores de RiscoRESUMO
Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.
Assuntos
Variação Genética , Estudo de Associação Genômica Ampla , Modelos Genéticos , Teorema de Bayes , Feminino , Humanos , Masculino , FenótipoRESUMO
Objective: The purpose of this study was to understand the experiences of historically underrepresented graduate students, more than half of whom were enrolled in science, technology, engineering, and mathematics (STEM) disciplines, during the COVID-19 pandemic. This focus group study represents an initial stage in developing an intervention for historically underrepresented graduate students and their families. Background: Underrepresentation of graduate students of color in STEM has been attributed to a myriad of factors, including a lack of support systems. Familial support is an endorsed reason for persisting in graduate school. It is unclear what historically underrepresented graduate students' experiences are during uncertain times, such as a pandemic. Method: Focus groups were conducted online using a videoconferencing platform during the COVID-19 pandemic. Five focus groups included: historically underrepresented doctoral students (n = 5), historically underrepresented master's students (n = 6), academic faculty (n = 7), administrator, administrative faculty, and academic faculty (n = 6), and families of historically underrepresented doctoral students (n = 6). Data were analyzed using thematic analysis. Results: Historically underrepresented graduate students experienced difficulties in accessing resources, adjustments to home and family life, amplification of existing nonfinancial issues, and expressed both fears of and hopes for the future. Conclusion: The COVID-19 pandemic exacerbated existing inequalities in access to resources as well as nonfinancial family support. Implications: This study may help normalize historically underrepresented graduate students' experiences during the COVID-19 pandemic. The findings include ideas for informing families about graduate school that can enlighten family support efforts for historically underrepresented graduate students and their families.
RESUMO
Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce a latent factor model of genetic risk based on components from Decomposition of Genetic Associations (DeGAs), which we call the DeGAs polygenic risk score (dPRS). We compute DeGAs using genetic associations for 977 traits and find that dPRS performs comparably to standard PRS while offering greater interpretability. We show how to decompose an individual's genetic risk for a trait across DeGAs components, with examples for body mass index (BMI) and myocardial infarction (heart attack) in 337,151 white British individuals in the UK Biobank, with replication in a further set of 25,486 non-British white individuals. We find that BMI polygenic risk factorizes into components related to fat-free mass, fat mass, and overall health indicators like physical activity. Most individuals with high dPRS for BMI have strong contributions from both a fat-mass component and a fat-free mass component, whereas a few "outlier" individuals have strong contributions from only one of the two components. Overall, our method enables fine-scale interpretation of the drivers of genetic risk for complex traits.
Assuntos
Estudos de Associação Genética , Predisposição Genética para Doença , Herança Multifatorial , Característica Quantitativa Herdável , Algoritmos , Bancos de Espécimes Biológicos , Bases de Dados Genéticas , Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla , Humanos , Modelos Genéticos , Fenótipo , Vigilância da População , Reprodutibilidade dos Testes , Medição de Risco , Fatores de Risco , Reino Unido/epidemiologiaRESUMO
Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.
Assuntos
Biomarcadores/sangue , Biomarcadores/urina , Antígenos HLA/genética , Proteínas/genética , Bancos de Espécimes Biológicos , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/metabolismo , Variações do Número de Cópias de DNA , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Pleiotropia Genética , Humanos , Desequilíbrio de Ligação , Transportador 1 de Ânion Orgânico Específico do Fígado/genética , Análise da Randomização Mendeliana , Polimorfismo de Nucleotídeo Único , Insuficiência Renal Crônica , Serina Endopeptidases/genética , Reino UnidoRESUMO
The clinical evaluation of a genetic syndrome relies upon recognition of a characteristic pattern of signs or symptoms to guide targeted genetic testing for confirmation of the diagnosis. However, individuals displaying a single phenotype of a complex syndrome may not meet criteria for clinical diagnosis or genetic testing. Here, we present a phenome-wide association study (PheWAS) approach to systematically explore the phenotypic expressivity of common and rare alleles in genes associated with four well-described syndromic diseases (Alagille (AS), Marfan (MS), DiGeorge (DS), and Noonan (NS) syndromes) in the general population. Using human phenotype ontology (HPO) terms, we systematically mapped 60 phenotypes related to AS, MS, DS and NS in 337,198 unrelated white British from the UK Biobank (UKBB) based on their hospital admission records, self-administrated questionnaires, and physiological measurements. We performed logistic regression adjusting for age, sex, and the first 5 genetic principal components, for each phenotype and each variant in the target genes (JAG1, NOTCH2 FBN1, PTPN1 and RAS-opathy genes, and genes in the 22q11.2 locus) and performed a gene burden test. Overall, we observed multiple phenotype-genotype correlations, such as the association between variation in JAG1, FBN1, PTPN11 and SOS2 with diastolic and systolic blood pressure; and pleiotropy among multiple variants in syndromic genes. For example, rs11066309 in PTPN11 was significantly associated with a lower body mass index, an increased risk of hypothyroidism and a smaller size for gestational age, all in concordance with NS-related phenotypes. Similarly, rs589668 in FBN1 was associated with an increase in body height and blood pressure, and a reduced body fat percentage as observed in Marfan syndrome. Our findings suggest that the spectrum of associations of common and rare variants in genes involved in syndromic diseases can be extended to individual phenotypes within the general population.
Assuntos
Variação Biológica da População/genética , Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla/métodos , Síndrome de Alagille/genética , Alelos , Síndrome de DiGeorge/genética , Feminino , Frequência do Gene/genética , Predisposição Genética para Doença/genética , Testes Genéticos/métodos , Variação Genética/genética , Humanos , Masculino , Síndrome de Marfan/genética , Síndrome de Noonan/genética , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Reino Unido , População Branca/genéticaRESUMO
The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been showed to greatly improve the prediction performance for a variety of phenotypes. In the high-dimensional settings, the lasso, since its first proposal in statistics, has been proved to be an effective method for simultaneous variable selection and estimation. However, the large-scale and ultrahigh dimension seen in the UK Biobank pose new challenges for applying the lasso method, as many existing algorithms and their implementations are not scalable to large applications. In this paper, we propose a computational framework called batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and easily build a scalable solution for very large data, including those that are larger than the memory size. We introduce snpnet, an R package that implements the proposed algorithm on top of glmnet and optimizes for single nucleotide polymorphism (SNP) datasets. It currently supports â1-penalized linear model, logistic regression, Cox model, and also extends to the elastic net with â1/â2 penalty. We demonstrate results on the UK Biobank dataset, where we achieve competitive predictive performance for all four phenotypes considered (height, body mass index, asthma, high cholesterol) using only a small fraction of the variants compared with other established polygenic risk score methods.
Assuntos
Asma/epidemiologia , Bancos de Espécimes Biológicos , Genética Populacional , Estudo de Associação Genômica Ampla , Algoritmos , Asma/sangue , Asma/genética , Estatura/genética , Índice de Massa Corporal , Colesterol/sangue , Estudos de Coortes , Genótipo , Humanos , Modelos Logísticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Modelos de Riscos Proporcionais , Reino Unido/epidemiologiaRESUMO
Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters, including genetic correlation, to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family history GWAS using cases ascertained on family history of disease agrees with combined hospital record and questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.
Assuntos
Doença/genética , Estudo de Associação Genômica Ampla , Fenótipo , Asma/genética , Bases de Dados Factuais , Feminino , Genética Médica , Genótipo , Humanos , Masculino , Neoplasias/genética , Reino UnidoRESUMO
Suicide accounts for nearly 800,000 deaths per year worldwide with rates of both deaths and attempts rising. Family studies have estimated substantial heritability of suicidal behavior; however, collecting the sample sizes necessary for successful genetic studies has remained a challenge. We utilized two different approaches in independent datasets to characterize the contribution of common genetic variation to suicide attempt. The first is a patient reported suicide attempt phenotype asked as part of an online mental health survey taken by a subset of participants (n = 157,366) in the UK Biobank. After quality control, we leveraged a genotyped set of unrelated, white British ancestry participants including 2433 cases and 334,766 controls that included those that did not participate in the survey or were not explicitly asked about attempting suicide. The second leveraged electronic health record (EHR) data from the Vanderbilt University Medical Center (VUMC, 2.8 million patients, 3250 cases) and machine learning to derive probabilities of attempting suicide in 24,546 genotyped patients. We identified significant and comparable heritability estimates of suicide attempt from both the patient reported phenotype in the UK Biobank (h2SNP = 0.035, p = 7.12 × 10-4) and the clinically predicted phenotype from VUMC (h2SNP = 0.046, p = 1.51 × 10-2). A significant genetic overlap was demonstrated between the two measures of suicide attempt in these independent samples through polygenic risk score analysis (t = 4.02, p = 5.75 × 10-5) and genetic correlation (rg = 1.073, SE = 0.36, p = 0.003). Finally, we show significant but incomplete genetic correlation of suicide attempt with insomnia (rg = 0.34-0.81) as well as several psychiatric disorders (rg = 0.26-0.79). This work demonstrates the contribution of common genetic variation to suicide attempt. It points to a genetic underpinning to clinically predicted risk of attempting suicide that is similar to the genetic profile from a patient reported outcome. Lastly, it presents an approach for using EHR data and clinical prediction to generate quantitative measures from binary phenotypes that can improve power for genetic studies.
Assuntos
Estudo de Associação Genômica Ampla , Aprendizado de Máquina , Probabilidade , Tentativa de Suicídio/estatística & dados numéricos , Bancos de Espécimes Biológicos , Registros Eletrônicos de Saúde , Feminino , Inquéritos Epidemiológicos , Humanos , Masculino , Saúde Mental , Fenótipo , Fatores de Risco , Ideação Suicida , Tennessee , Reino Unido , População Branca/genéticaRESUMO
Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we apply truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identify key components of genetic associations and the contributions of variants, genes, and phenotypes to each component. As an illustration of the utility of the approach to inform downstream experiments, we report putative loss of function variants, rs114285050 (GPR151) and rs150090666 (PDE3B), that substantially contribute to obesity-related traits and experimentally demonstrate the role of these genes in adipocyte biology. Our approach to dissect components of genetic associations across the human phenome will accelerate biomedical hypothesis generation by providing insights on previously unexplored latent structures.
Assuntos
Adipócitos/metabolismo , Bancos de Espécimes Biológicos , Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla/métodos , Células 3T3-L1 , Adipócitos/citologia , Animais , Células Cultivadas , Nucleotídeo Cíclico Fosfodiesterase do Tipo 3/genética , Predisposição Genética para Doença/genética , Humanos , Camundongos , Obesidade/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Reino UnidoRESUMO
Copy-number variations (CNVs) represent a significant proportion of the genetic differences between individuals and many CNVs associate causally with syndromic disease and clinical outcomes. Here, we characterize the landscape of copy-number variation and their phenome-wide effects in a sample of 472,228 array-genotyped individuals from the UK Biobank. In addition to population-level selection effects against genic loci conferring high mortality, we describe genetic burden from potentially pathogenic and previously uncharacterized CNV loci across more than 3,000 quantitative and dichotomous traits, with separate analyses for common and rare classes of variation. Specifically, we highlight the effects of CNVs at two well-known syndromic loci 16p11.2 and 22q11.2, previously uncharacterized variation at 9p23, and several genic associations in the context of acute coronary artery disease and high body mass index. Our data constitute a deeply contextualized portrait of population-wide burden of copy-number variation, as well as a series of dosage-mediated genic associations across the medical phenome.
Assuntos
Transtorno Autístico/genética , Transtornos Cromossômicos/genética , Cromossomos Humanos Par 9/genética , Doença da Artéria Coronariana/genética , Variações do Número de Cópias de DNA , Síndrome de DiGeorge/genética , Deficiência Intelectual/genética , Fenômica , Polimorfismo de Nucleotídeo Único , Bancos de Espécimes Biológicos , Estudos de Casos e Controles , Deleção Cromossômica , Cromossomos Humanos Par 16/genética , Feminino , Loci Gênicos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino , Fenótipo , Reino UnidoRESUMO
SUMMARY: Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here, we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities. AVAILABILITY AND IMPLEMENTATION: GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.